Ligation-based detection of genetic variants

ABSTRACT

The present invention provides assays systems and methods for detection of genetic variants in a sample, including copy number variation and single nucleotide polymorphisms. The invention preferably employs the technique of tandem ligation—e.g., the ligation of two or more fixed sequence oligonucleotides and one or more bridging oligonucleotides complementary to a region between the fixed sequence oligonucleotides—combined with detection of levels of particular genomic regions using array hybridization.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority to U.S. Ser. No. 61/371,605,filed Aug. 6, 2010, U.S. Ser. No. 13/013,732, filed Jan. 25, 2011, U.S.Ser. No. 13/205,603, filed Aug. 8, 2011; U.S. Ser. No. 13/205,570, filedAug. 8, 2011; and U.S. Ser. No. 13/293,419, filed Nov. 10, 2011, each ofwhich are incorporated by reference in their entirety.

FIELD OF THE INVENTION

This invention relates to multiplexed selection, amplification, anddetection of targeted regions from a genetic sample.

BACKGROUND OF THE INVENTION

In the following discussion certain articles and methods will bedescribed for background and introductory purposes. Nothing containedherein is to be construed as an “admission” of prior art. Applicantexpressly reserves the right to demonstrate, where appropriate, that thearticles and methods referenced herein do not constitute prior art underthe applicable statutory provisions.

Genetic abnormalities account for a wide number of pathologies,including pathologies caused by chromosomal aneuploidy (e.g., Down'ssyndrome), germline mutations in specific genes (e.g., sickle cellanemia), and pathologies caused by somatic mutations (e.g., cancer).Diagnostic methods for determining such genetic anomalies have becomestandard techniques for identifying specific diseases and disorders, aswell as providing valuable information on disease source and treatmentoptions.

Copy-number variations are alterations of genomic DNA that correspond torelatively large regions of the genome that have been deleted oramplified on certain chromosomes. CNVs can be caused by genomicrearrangements such as deletions, duplications, inversions, andtranslocations. Copy number variation has been associated with variousforms of cancer (Cappuzzo F, Hirsch, et al. (2005) 97 (9): 643-655)neurological disorders (Sebat, J., et al. (2007) Science 316 (5823):445-9, including autism (Sebat, J., et al. (2007) Science 316 (5823):445-9), and schizophrenia St Clair D (2008). Schizophr Bull 35 (1):9-12. Detection of copy number variants of a chromosome of interest or aportion thereof in a specific cell population can be a powerful tool toidentify genetic diagnostic or prognostic indicators of a disease ordisorder.

Detection of copy number variation is also useful in detectingchromosomal aneuploidies in fetal DNA. Conventional methods of prenataldiagnostic testing currently requires removal of a sample of fetal cellsdirectly from the uterus for genetic analysis, using either chorionicvillus sampling (CVS) between 11 and 14 weeks gestation or amniocentesisafter 15 weeks. However, these invasive procedures carry a risk ofmiscarriage of around 1% Mujezinovic and Alfirevic, Obstet Gynecol 2007;110:687-694. A reliable and convenient method for non-invasive prenataldiagnosis has long been sought to reduce this risk of miscarriage andallow earlier testing.

Single nucleotide polymorphisms (SNPs) are single nucleotide differencesat specific regions of the genome. The average human genome typicallyhas more than three million SNPs when compared to a reference genome.SNPs have been associated with various diseases, including cancer,cardiovascular disease, cystic fibrosis, and diabetes. Detection of SNPscan be a powerful tool to identify genetic diagnostic or prognosticindicators of a disease or disorder. It is often desirable to detectmany different SNPs in the same sample.

Re-sequencing is the use of DNA sequence detection, often in a portionof the genome. Re-sequencing can be applied towards the analysis of agenetic sample from any source including mammals, other animal species,plants, bacteria, viruses, and the like. Re-sequencing can be used formany applications including but not limited to clinical applications andenvironmental applications. One use of re-sequencing for clinicalapplications is the determination of the DNA sequence in a diseasecausing gene. Examples of gene re-sequencing for medical diagnostic orprognostic indications include the re-sequencing of BRCA1 and BRCA2 forbreast cancer risk. An example of an environmental application would bethe detection of a specific pathogen in a water source.

There is thus a need for methods of screening for copy numbervariations, SNPs and re-sequencing that employs an efficient,reproducible multiplexed assay. The present invention addresses thisneed.

SUMMARY OF THE INVENTION

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key or essentialfeatures of the claimed subject matter, nor is it intended to be used tolimit the scope of the claimed subject matter. Other features, details,utilities, and advantages of the claimed subject matter will be apparentfrom the following written Detailed Description including those aspectsillustrated in the accompanying drawings and defined in the appendedclaims.

The present invention provides assays systems and methods for detectionof copy number variation, polymorphisms, mutations and re-sequencing.The invention employs the technique of selecting genomic regions usingfixed sequence oligonucleotides and joining them via ligation and/orextension. In a preferred aspect this is accomplished by tandemligation, i.e. the ligation of two or more non-adjacent, fixed sequenceoligonucleotides and a bridging oligonucleotide that is complementary toa region between and directly adjacent to the portion of the nucleicacid region of interest complementary to the fixed sequenceoligonucleotides.

In one general aspect, the invention provides an assay system fordetecting a nucleic acid region of interest in a genetic sample,comprising the steps of providing a genetic sample; introducing a firstand second fixed sequence oligonucleotide to the genetic sample underconditions that allow the fixed sequence oligonucleotides tospecifically hybridize to complementary regions in the nucleic acid ofinterest; introducing one or more bridging oligonucleotides underconditions that allow the fixed sequence oligonucleotides tospecifically hybridize to complementary regions in the nucleic acid ofinterest, wherein the one or more bridging oligonucleotides arecomplementary to a region of the nucleic acid between and immediatelyadjacent to the region complementary to the first and second fixedsequence oligonucleotides; ligating the hybridized oligonucleotides tocreate a contiguous ligation product complementary to the nucleic acidregion of interest; amplifying the contiguous ligation product to createamplification products having the sequence of the nucleic acid region;and detecting and quantifying the amplification products, whereindetection of the amplification product provides detection of the nucleicacid region in the genetic sample. The amplification products areoptionally isolated and quantified to determine the relative frequencyof the nucleic acid region in the genetic sample.

In another general aspect, the invention provides an assay system fordetecting a nucleic acid region of interest in a genetic sample,comprising the steps of providing a genetic sample; introducing a firstand second fixed sequence oligonucleotide to the genetic sample underconditions that allow the fixed sequence oligonucleotides tospecifically hybridize to complementary regions in the nucleic acid ofinterest; introducing one or more bridging oligonucleotidescomplementary to a region of the nucleic acid of interest between theregions complementary to the first and second fixed sequenceoligonucleotides under conditions that allow the bridgingoligonucleotides to specifically hybridize to the nucleic acid ofinterest, wherein at least one or more bases on either or both ends ofthe bridging oligonucleotide are not immediately adjacent to the fixedsequence oligonucleotides; extending the one or more bridgingoligonucleotides so that the bridging oligonucleotides are immediatelyadjacent to the fixed sequence oligonucleotides; ligating the hybridizedand extended oligonucleotides to create a contiguous ligation product;amplifying the contiguous ligation product to create amplificationproducts having the sequence of the nucleic acid region of interest; anddetecting and quantifying the amplification products, wherein detectionof the amplification product provides detection of the nucleic acidregion in the genetic sample. The amplification products are optionallyisolated and quantified to determine the relative frequency of thenucleic acid region in the genetic sample.

The relative frequency of the nucleic acid in the sample can be used todetermine not only copy number variation for that particular nucleicacid region, but also in conjunction with and/or in comparison to othernucleic acids, it may be used to determine the copy number variation oflarger genomic regions, including chromosomes.

The fixed sequence oligonucleotides used in the assay system preferablycomprise universal primer regions that are used in amplification of thecontiguous ligation product. Alternatively, the universal primersequences can be added to the contiguous ligation products following theligation of the hybridized fixes sequence and bridging oligonucleotides,e.g., through the introduction of adapters comprising such universalprimer sequences to the ends of the contiguous ligation product.

The bridging oligonucleotides are preferably shorter oligonucleotides,preferably between 1-10 nucleotides and more preferably between 3-7nucleotides, and can be designed to provide degeneracy within thesequence of the bridging oligonucleotides, e.g., the bridgingoligonucleotides are provided as full or partial randomers with varioussequence variations to ensure detection of the selected nucleic regioneven if the region contains a polymorphic reside. The degeneracy of thebridging oligonucleotide can be determined based on the predictedpolymorphisms that may be present in the selected nucleic acid region.Alternatively, the pool of bridging oligonucleotides used in a reactioncan provide degeneracy for one or more position of the bridgingoligonucleotide. In one aspect, the pool of bridging oligonucleotidesused in a reaction can provide degeneracy for each position of thebridging oligonucleotide. In yet another aspect, the pool of bridgingmolecules used in a reaction can provide degeneracy for each internalposition of the bridging oligonucleotide, with the nucleotides adjacentto the ligation sites remaining constant in the pool of bridgingoligonucleotides used within the set. In another aspect, the bridgingoligo is longer than 10 nucleotides and preferably 18-30 nucleotides. Ina preferred aspect, a single bridging oligonucleotide complementary to aregion of the nucleic acid of interest is hybridized between the regioncomplementary to the first and second fixed sequence oligonucleotides.In another aspect, two or more bridging oligonucleotides are hybridizedwithin the region between the fixed sequence oligonucleotides, andpreferably the bridging oligonucleotides hybridize to adjacent regionson the nucleic acid of interest. In this situation, ligation occursbetween the fixed sequence oligonucleotides and the adjacent bridgingoligonucleotides as well as between adjacent bridging oligonucleotides.In another aspect, there are one or more bases between the serialbridging oligonucleotides and/or one or more bases between the bridgingoligonucleotides and fixed sequence oligonucleotides. These gaps can beextended, e.g., by use of polymerase and dNTPs prior to ligation.

It is an advantage that using degenerate bridging oligonucleotidesobviates the need to predetermine the maternal and fetal polymorphiccontent for a selected nucleic acid region prior to employing thedetection methods of the assay system.

In one aspect of the invention, the first and second fixed sequenceoligonucleotides are introduced to the genetic sample and specificallyhybridized to the complementary portions of the nucleic acids ofinterest prior to introduction of the bridging oligonucleotides. Thehybridized regions are optionally isolated following the specifichybridization of the fixed sequence oligonucleotides to remove anyexcess unbound oligonucleotides in the reaction.

In another aspect, the bridging oligonucleotides are introduced to thegenetic sample at the same time the fixed sequence oligonucleotides areintroduced, and all are allowed to hybridize to a contiguous portion ofthe nucleic acid region of interest.

In certain aspects, the fixed sequence oligonucleotides of the inventioncomprise one or more indices. These indices may serve as surrogatesequences for the identification of the nucleic acid region of interest,a locus, or a particular allele of a locus. In particular, these indicesmay serve as surrogate detection sequences for the detection ofhybridization of the nucleic acid region of interest to an array. Otherindices may be used to correspond an amplification product to aparticular sample, or to identify experimental error within the assaymethods. In particular assays, the amplification product from thecontiguous ligation product is identified and quantified using one ormore indices as a surrogate to the actual sequence of the amplificationproduct.

In specific assay systems, the first or second fixed sequenceoligonucleotide comprise an allele index that associates a specificallele with that complementary fixed sequence oligonucleotide.

In other specific aspects of the invention, the fixed sequenceoligonucleotides are used for comparative hybridization of genomicregions, e.g., genomic regions corresponding to a particular locus orchromosome.

In certain aspects of the invention, an assay system employs two indexsequences that allow direct comparison of levels of particular genomicregions in a sample using array hybridization.

In a general aspect of the invention, a method is provided for detectinga variance in the frequency of a genomic region in a genetic sample.This method comprises the steps of providing a maternal sample,introducing at east two sets of first and second fixed sequenceoligonucleotides to specifically hybridize to complementary regions innucleic acid regions of interest, wherein each set comprises anoligonucleotide associated with an optically detectable label, andwherein both sets comprise a region that binds selectively to a singlearray feature; ligating the hybridized oligonucleotides to createcontiguous ligation products complementary to nucleic acid regions ofinterest; introducing the contiguous ligation product from both sets toan array comprising one or more features complementary to the contiguousligation products; and detecting the hybridization of the contiguousligation products from the first and second set by detection of theoptically detectable labels; wherein the relative frequency of theoptically detectable labels on the array is indicative of the presenceor absence of a variance in the frequency of a nucleic acid region ofinterest in the genetic sample.

In another general aspect of the invention, a method is provided fordetecting regions of interest corresponding to a first and secondchromosome in a genetic sample. This method comprises the steps ofproviding a genetic sample introducing at least two sets of first andsecond fixed sequence oligonucleotides to the genetic sample underconditions that allow the sets of fixed sequence oligonucleotides tospecifically hybridize to complementary regions in nucleic acid regionsof interest, wherein the first set of fixed sequence oligonucleotides iscomplementary to a genomic region on a first chromosome and the secondset of fixed sequence oligonucleotides is complementary to a genomicregion on a second chromosome, and wherein each set comprises anoligonucleotide associated with an optically detectable label, andwherein both sets comprise a region that binds selectively to a singlearray feature; ligating the hybridized oligonucleotides to createcontiguous ligation products complementary to nucleic acid regions ofinterest; introducing the contiguous ligation products from both sets toan array comprising one or more features complementary to the contiguousligation products; and detecting hybridization of the contiguousligation products from the first and second set to the array bydetection of the optically detectable labels; wherein the relativefrequency of the optically detectable labels on the array is indicativeof the presence or absence of a variance in the frequency of a first andsecond chromosome in the genetic sample.

In certain specific aspects, the method is carried out for one to 10,000nucleic acid regions of interest on a chromosome, such as two to 1,000nucleic acid regions of interest or any intervening range.

In certain specific aspects, hybridization of contiguous ligationproducts comprises hybridization to individual oligonucleotides bound tothe array.

In certain aspects, variance is detected by an alteration of theexpected ratio of nucleic acids of interest in the genetic sample. Incertain specific aspects the varieance is detected by an increased ordecreased level of hybridization of one set of contiguous ligationproducts as compared to a second set of contiguous ligation products.

In another general aspect of the invention, an assay system is providedfor detecting a nucleic acid region of interest in a maternal samplecomprising both maternal and fetal cell free DNA. This assay systemcomprises the steps of providing a maternal sample comprising cell freeDNA from both maternal and fetal sources; introducing a first and secondnon-adjacent, fixed sequence oligonucleotide to the genetic sample underconditions that allow the fixed sequence oligonucleotides tospecifically hybridize to complementary regions in the nucleic acid ofinterest; introducing one or more bridging oligonucleotides underconditions that allow the bridging oligonucleotides to specificallyhybridize to complementary regions in the nucleic acid of interest,wherein one or more bridging oligonucleotides are complementary to aregion of the nucleic acid between and immediately adjacent to theregion complementary to the first and second fixed sequenceoligonucleotides; ligating the hybridized oligonucleotides to create acontiguous ligation product complementary to the nucleic acid region ofinterest; amplifying the contiguous ligation product to createamplification products having the sequence of the nucleic acid region;and detecting and quantifying the amplification products; whereinquantification of the amplification product provides a relativefrequency of the nucleic acid region in the maternal sample.

The relative frequency of the nucleic acid in the sample can be used todetermine not only copy number variation for that particular nucleicacid region, but also in conjunction with and/or in comparison to othernucleic acids, it may be used to determine the copy number variation oflarger genomic regions, including chromosomal imbalance between maternaland fetal nucleic acid regions due to aneuploidy in the fetus.

The invention also provides compositions that are useful inligation-based nucleic acid detection assays such as those of thepresent invention. Accordingly, the invention provides sets ofoligonucleotides for ligation-based detection of a nucleic acid regionof interest, comprising a first oligonucleotide that comprises sequencescomplementary to the sequences of a first portion of a nucleic acidregion, a universal primer sequence, and optionally one or more indices;a second oligonucleotide that comprises sequences complementary to thesequence of a second portion of a nucleic acid region and a universalprimer sequence; and one or more bridging oligonucleotides that arecomplementary to the region immediately adjacent and between the nucleicacid region complementary to the first and second oligonucleotides. Incertain aspects, the set of oligonucleotides comprises two or morebridging oligonucleotides with the ability to identify differentpolymorphisms within the nucleic acid of interest. In other aspects, thebridging molecules provide degeneracy for each position of the bridgingoligonucleotide. In yet other aspects, the bridging molecules providedegeneracy for each internal position of the bridging oligonucleotide,with the nucleotides adjacent to the ligation sites remaining constantin the pool of bridging oligonucleotides used within the set.

These aspects and other features and advantages of the invention aredescribed in more detail below.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates a first general schematic for a ligation-based assaysystem of the invention.

FIG. 2 illustrates a second general schematic for a ligation-based assaysystem of the invention.

FIG. 3 illustrates a multiplexed assay system for detection of two ormore regions of interest.

FIG. 4 illustrates a first multiplexed assay system for detection of twoor more alleles within a region of interest.

FIG. 5 illustrates a second multiplexed assay system for detection oftwo or more alleles within a region of interest.

FIG. 6 illustrates a third multiplexed assay system for detection of twoor more alleles within a region of interest.

FIG. 7 illustrates a fourth multiplexed assay system for detection oftwo or more alleles within a region of interest.

FIG. 8 illustrates a fifth multiplexed assay system for detection of twoor more alleles within a region of interest.

FIG. 9 illustrates a first general schematic for assay system utilizingoligo extension in a ligation-based assay system of the invention.

FIG. 10 illustrates a second general schematic for assay systemutilizing oligo extension in a ligation-based assay system of theinvention.

FIG. 11 illustrates an assay system utilizing a single fixed sequenceoligonucleotide.

FIG. 12 illustrates a first comparative hybridization scheme usingoligonucleotides that selectively hybridize to loci from two differentchromosomes.

FIG. 13 illustrates a second comparative hybridization scheme usingoligonucleotides that selectively hybridize to different alleles of alocus.

FIG. 14 illustrates the genotyping performance that is obtained usingone exemplary assay format.

FIG. 15 is a graph illustrating the ability of the assay system todetermine percent fetal DNA in a maternal sample.

DEFINITIONS

The terms used herein are intended to have the plain and ordinarymeaning as understood by those of ordinary skill in the art. Thefollowing definitions are intended to aid the reader in understandingthe present invention, but are not intended to vary or otherwise limitthe meaning of such terms unless specifically indicated.

The term “allele index” refers generally to a series of nucleotides thatcorresponds to a specific SNP. The allele index may contain additionalnucleotides that allow for the detection of deletion, substitution, orinsertion of one or more bases. The index may be combined with any otherindex to create one index that provides information for two properties(e.g., sample-identification index, allele-locus index).

The term “binding pair” means any two molecules that specifically bindto one another using covalent and/or non-covalent binding, and which canbe used for attachment of genetic material to a substrate. Examplesinclude, but are not limited to, ligands and their protein bindingpartners, e.g., biotin and avidin, biotin and streptavidin, an antibodyand its particular epitope, and the like.

The term “chromosomal abnormality” refers to any genetic variant for allor part of a chromosome. The genetic variants may include but not belimited to any copy number variant such as duplications or deletions,translocations, inversions, and mutations.

The terms “complementary” or “complementarity” are used in reference tonucleic acid molecules (i.e., a sequence of nucleotides) that arerelated by base-pairing rules. Complementary nucleotides are, generally,A and T (or A and U), or C and G. Two single stranded RNA or DNAmolecules are said to be substantially complementary when thenucleotides of one strand, optimally aligned and with appropriatenucleotide insertions or deletions, pair with at least about 90% toabout 95% complementarity, and more preferably from about 98% to about100% complementarity, and even more preferably with 100%complementarity. Alternatively, substantial complementarity exists whenan RNA or DNA strand will hybridize under selective hybridizationconditions to its complement. Selective hybridization conditionsinclude, but are not limited to, stringent hybridization conditions.Stringent hybridization conditions will typically include saltconcentrations of less than about 1 M, more usually less than about 500mM and preferably less than about 200 mM. Hybridization temperatures aregenerally at least about 2° C. to about 6° C. lower than meltingtemperatures (T_(m)).

The term “correction index” refers to an index that may containadditional nucleotides that allow for identification and correction ofamplification, sequencing or other experimental errors including thedetection of deletion, substitution, or insertion of one or more basesduring sequencing as well as nucleotide changes that may occur outsideof sequencing such as oligo synthesis, amplification, and any otheraspect of the assay.

The term “diagnostic tool” as used herein refers to any composition orassay of the invention used in combination as, for example, in a systemin order to carry out a diagnostic test or assay on a patient sample.

The term “genetic sample” refers to any sample comprising all or aportion of the genetic information of an organism, including but notlimited to virus, bacteria, fungus, plants and animals, and inparticular mammals. The genetic information that can be interrogatedwithin a genetic sample includes genomic DNA (both coding and non-codingregions), mitochondrial DNA, RNA, and nucleic acid products derived fromeach of these. Such nucleic acid products include cDNA created from mRNAor products of pre-amplification to increase the material for analysis.

The term “hybridization” generally means the reaction by which thepairing of complementary strands of nucleic acid occurs. DNA is usuallydouble-stranded, and when the strands are separated they willre-hybridize under the appropriate conditions. Hybrids can form betweenDNA-DNA, DNA-RNA or RNA-RNA. They can form between a short strand and along strand containing a region complementary to the short one.Imperfect hybrids can also form, but the more imperfect they are, theless stable they will be (and the less likely to form).

The term “identification index” refers generally to a series ofnucleotides that are incorporated into an oligonucleotide duringoligonucleotide synthesis for identification purposes. Identificationindex sequences are preferably 6 or more nucleotides in length. In apreferred aspect, the identification index is long enough to havestatistical probability of labeling each molecule with a target sequenceuniquely. For example, if there are 3000 copies of a particular targetsequence, there are substantially more than 3000 identification indexessuch that each copy of a particular target sequence is likely to belabeled with a unique identification index. The identification index maycontain additional nucleotides that allow for identification andcorrection of sequencing errors including the detection of deletion,substitution, or insertion of one or more bases during sequencing aswell as nucleotide changes that may occur outside of sequencing such asoligo synthesis, amplification, and any other aspect of the assay. Theindex may be combined with any other index to create one index thatprovides information for two properties (e.g., sample-identificationindex, allele-locus index).

The term “identification index” refers generally to a series ofnucleotides incorporated into a primer region of an amplificationprocess for unique identification of an amplification product of anucleic acid region. Identification index sequences are preferably 6 ormore nucleotides in length. In a preferred aspect, the identificationindex is long enough to have statistical probability of labeling eachmolecule with a target sequence uniquely. For example, if there are 3000copies of a particular target sequence, there are substantially morethan 3000 identification indexes such that each copy of a particulartarget sequence is likely to be labeled with a unique identificationindex. The identification index may contain additional nucleotides thatallow for identification and correction of sequencing errors includingthe detection of deletion, substitution, or insertion of one or morebases during sequencing as well as nucleotide changes that may occuroutside of sequencing such as oligo synthesis, amplification, and anyother aspect of the assay. The index may be combined with any otherindex to create one index that provides information for two properties(e.g., sample-identification index, locus-identification index).

As used herein the term “ligase” refers generally to a class of enzymes,DNA ligases (typically T4 DNA ligase), which can link pieces of DNAtogether. The pieces must have compatible ends—either with both of themblunt or with mutually-compatible sticky ends—and the reaction requiresATP. “Ligation” is the process of joining two pieces of DNA together.

The terms “locus” and “loci” as used herein refer to a nucleic acidregions of known location in a genome.

The term “locus index” refers generally to a series of nucleotides thatcorrespond to a given genomic locus. In a preferred aspect, the locusindex is long enough to label each target sequence region uniquely. Forinstance, if the method uses 192 target sequence regions, there are atleast 192 unique locus indexes, each uniquely identifying each targetregion. The locus index may contain additional nucleotides that allowfor identification and correction of sequencing errors including thedetection of deletion, substitution, or insertion of one or more basesduring sequencing as well as nucleotide changes that may occur outsideof sequencing such as oligo synthesis, amplification, and any otheraspect of the assay. The index may be combined with any other index tocreate one index that provides information for two properties (e.g.sample-identification index, allele-locus index).

The term “maternal sample” as used herein refers to any sample takenfrom a pregnant mammal which comprises both fetal and maternal cell freeDNA. Preferably, maternal samples for use in the invention are obtainedthrough relatively non-invasive means, e.g., phlebotomy or otherstandard techniques for extracting peripheral samples from a subject.

The term “melting temperature” or T_(m) is commonly defined as thetemperature at which a population of double-stranded nucleic acidmolecules becomes half dissociated into single strands. The equation forcalculating the T_(m) of nucleic acids is well known in the art. Asindicated by standard references, a simple estimate of the T_(m) valuemay be calculated by the equation: T_(m)=81.5+16.6(log10[Na+])0.41(%[G+C])−675/n−1.0 m, when a nucleic acid is in aqueoussolution having cation concentrations of 0.5 M or less, the (G+C)content is between 30% and 70%, n is the number of bases, and m is the %age of base pair mismatches (see, e.g., Sambrook J et al., MolecularCloning, A Laboratory Manual, 3rd Ed., Cold Spring Harbor LaboratoryPress (2001)). Other references include more sophisticated computations,which take structural as well as sequence characteristics into accountfor the calculation of T_(m).

“Microarray” or “array” refers to a solid phase support having asurface, preferably but not exclusively a planar or substantially planarsurface, which carries an array of sites containing nucleic acids suchthat each site of the array comprises substantially identical oridentical copies of oligonucleotides or polynucleotides and is spatiallydefined and not overlapping with other member sites of the array; thatis, the sites are spatially discrete. The array or microarray can alsocomprise a non-planar interrogatable structure with a surface such as abead or a well. The oligonucleotides or polynucleotides of the array maybe covalently bound to the solid support, or may be non-covalentlybound. Conventional microarray technology is reviewed in, e.g., Schena,Ed., Microarrays: A Practical Approach, IRL Press, Oxford (2000). “Arrayanalysis”, “analysis by array” or “analysis by microarray” refers toanalysis, such as, e.g., sequence analysis, of one or more biologicalmolecules using a microarray.

The term “oligonucleotides” or “oligos” as used herein refers to linearoligomers of natural or modified nucleic acid monomers, includingdeoxyribonucleotides, ribonucleotides, anomeric forms thereof, peptidenucleic acid monomers (PNAs), locked nucleotide acid monomers (LNA), andthe like, or a combination thereof, capable of specifically binding to asingle-stranded polynucleotide by way of a regular pattern ofmonomer-to-monomer interactions, such as Watson-Crick type of basepairing, base stacking, Hoogsteen or reverse Hoogsteen types of basepairing, or the like. Usually monomers are linked by phosphodiesterbonds or analogs thereof to form oligonucleotides ranging in size from afew monomeric units, e.g., 8-12, to several tens of monomeric units,e.g., 100-200 or more. Suitable nucleic acid molecules may be preparedby the phosphoramidite method described by Beaucage and Carruthers(Tetrahedron Lett., 22:1859-1862 (1981)), or by the triester methodaccording to Matteucci, et al. (J. Am. Chem. Soc., 103:3185 (1981)),both incorporated herein by reference, or by other chemical methods suchas using a commercial automated oligonucleotide synthesizer.

As used herein “nucleotide” refers to a base-sugar-phosphatecombination. Nucleotides are monomeric units of a nucleic acid sequence(DNA and RNA). The term nucleotide includes ribonucleoside triphosphatesATP, UTP, CTG, GTP and deoxyribonucleoside triphosphates such as dATP,dCTP, dITP, dUTP, dGTP, dTTP, or derivatives thereof. Such derivativesinclude, for example, [αS]dATP, 7-deaza-dGTP and 7-deaza-dATP, andnucleotide derivatives that confer nuclease resistance on the nucleicacid molecule containing them. The term nucleotide as used herein alsorefers to dideoxyribonucleoside triphosphates (ddNTPs) and theirderivatives. Illustrated examples of dideoxyribonucleoside triphosphatesinclude, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP.

According to the present invention, a “nucleotide” may be unlabeled ordetectably labeled by well known techniques. Fluorescent labels andtheir attachment to oligonucleotides are described in many reviews,including Haugland, Handbook of Fluorescent Probes and ResearchChemicals, 9th Ed., Molecular Probes, Inc., Eugene Oreg. (2002); Kellerand Manak, DNA Probes, 2nd Ed., Stockton Press, New York (1993);Eckstein, Ed., Oligonucleotides and Analogues: A Practical Approach, IRLPress, Oxford (1991); Wetmur, Critical Reviews in Biochemistry andMolecular Biology, 26:227-259 (1991); and the like. Other methodologiesapplicable to the invention are disclosed in the following sample ofreferences: Fung et al., U.S. Pat. No. 4,757,141; Hobbs, Jr., et al.,U.S. Pat. No. 5,151,507; Cruickshank, U.S. Pat. No. 5,091,519; Menchenet al., U.S. Pat. No. 5,188,934; Begot et al., U.S. Pat. No. 5,366,860;Lee et al., U.S. Pat. No. 5,847,162; Khanna et al., U.S. Pat. No.4,318,846; Lee et al., U.S. Pat. No. 5,800,996; Lee et al., U.S. Pat.No. 5,066,580: Mathies et al., U.S. Pat. No. 5,688,648; and the like.Labeling can also be carried out with quantum dots, as disclosed in thefollowing patents and patent publications: U.S. Pat. Nos. 6,322,901;6,576,291; 6,423,551; 6,251,303; 6,319,426; 6,426,513; 6,444,143;5,990,479; 6,207,392; 2002/0045045; and 2003/0017264. Detectable labelsinclude, for example, radioactive isotopes, fluorescent labels,chemiluminescent labels, bioluminescent labels and enzyme labels.Fluorescent labels of nucleotides may include but are not limitedfluorescein, 5-carboxyfluorescein (FAM),2′7′-dimethoxy-4′5-dichloro-6-carboxyfluorescein (JOE), rhodamine,6-carboxyrhodamine (R6G), N,N,N′,N′-tetramethyl-6-carboxyrhodamine(TAMRA), 6-carboxy-X-rhodamine (ROX), 4-(4′ dimethylaminophenylazo)benzoic acid (DABCYL), CASCADE BLUE® (pyrenyloxytrisulfonic acid),OREGON GREEN™ (2′,7′-difluorofluorescein), TEXAS RED™ (sulforhodamine101 acid chloride), Cyanine and5-(2′-aminoethyl)aminonaphthalene-1-sulfonic acid (EDANS). Specificexamples of fluorescently labeled nucleotides include [R6G]dUTP,[TAMRA]dUTP, [R110]dCTP, [R6G]dCTP, [TAMRA]dCTP, [JOE]ddATP, [R6G]ddATP,[FAM]ddCTP, [R110]ddCTP, [TAMRA]ddGTP, [ROX]ddTTP, [dR6G]ddATP,[dR110]ddCTP, [dTAMRA]ddGTP, and [dROX]ddTTP available from PerkinElmer, Foster City, Calif. FluoroLink DeoxyNucleotides, FluoroLinkCy3-dCTP, FluoroLink Cy5-dCTP, FluoroLink Fluor X-dCTP, FluoroLinkCy3-dUTP, and FluoroLink Cy5-dUTP available from Amersham, ArlingtonHeights, Ill.; Fluorescein-15-dATP, Fluorescein-12-dUTP,Tetramethyl-rodamine-6-dUTP, IR770-9-dATP, Fluorescein-12-ddUTP,Fluorescein-12-UTP, and Fluorescein-15-2′-dATP available from BoehringerMannheim, Indianapolis, Ind.; and Chromosomee Labeled Nucleotides,BODIPY-FL-14-UTP, BODIPY-FL-4-UTP, BODIPY-TMR-14-UTP,BODIPY-TMR-14-dUTP, BODIPY-TR-14-UTP, BODIPY-TR-14-dUTP, CASCADEBLUE®-7-UTP (pyrenyloxytrisulfonic acid-7-UTP), CASCADE BLUE®-7-dUTP(pyrenyloxytrisulfonic acid-7-dUTP), fluorescein-12-UTP,fluorescein-12-dUTP, OREGON GREEN™ 488-5-dUTP(2′,7′-difluorofluorescein-5-dUTP), RHODAMINE GREEN™-5-UTP((5-{2-[4-(aminomethyl)phenyl]-5-(pyridin-4-yl)-1H-1-5-UTP)), RHODAMINEGREEN™-5-dUTP((5-{2-[4-(aminomethyl)phenyl]-5-(pyridin-4-yl)-1H-i-5-dUTP)),tetramethylrhodamine-6-UTP, tetramethylrhodamine-6-dUTP, TEXASRED™-5-UTP (sulforhodamine 101 acid chloride-5-UTP), TEXAS RED™-5-dUTP(sulforhodamine 101 acid chloride-5-dUTP), and TEXAS RED™-12-dUTP(sulforhodamine 101 acid chloride-12-dUTP) available from MolecularProbes, Eugene, Oreg.

As used herein the term “polymerase” refers to an enzyme that linksindividual nucleotides together into a long strand, using another strandas a template. There are two general types of polymerase—DNApolymerases, which synthesize DNA, and RNA polymerases, which synthesizeRNA. Within these two classes, there are numerous sub-types ofpolymerases, depending on what type of nucleic acid can function astemplate and what type of nucleic acid is formed.

As used herein “polymerase chain reaction” or “PCR” refers to atechnique for replicating a specific piece of target DNA in vitro, evenin the presence of excess non-specific DNA. Primers are added to thetarget DNA, where the primers initiate the copying of the target DNAusing nucleotides and, typically, Taq polymerase or the like. By cyclingthe temperature, the target DNA is repetitively denatured and copied. Asingle copy of the target DNA, even if mixed in with other, random DNA,can be amplified to obtain billions of replicates. The polymerase chainreaction can be used to detect and measure very small amounts of DNA andto create customized pieces of DNA. In some instances, linearamplification methods may be used as an alternative to PCR.

The term “polymorphism” as used herein refers to any genetic changes orvariants in a loci that may be indicative of that particular loci,including but not limited to single nucleotide polymorphisms (SNPs),methylation differences, short tandem repeats (STRs), and the like.

Generally, a “primer” is an oligonucleotide used to, e.g., prime DNAextension, ligation and/or synthesis, such as in the synthesis step ofthe polymerase chain reaction or in the primer extension techniques usedin certain sequencing reactions. A primer may also be used inhybridization techniques as a means to provide complementarity of anucleic acid region to a capture oligonucleoitide for detection of aspecific nucleic acid region.

The term “research tool” as used herein refers to any composition orassay of the invention used for scientific enquiry, academic orcommercial in nature, including the development of pharmaceutical and/orbiological therapeutics. The research tools of the invention are notintended to be therapeutic or to be subject to regulatory approval;rather, the research tools of the invention are intended to facilitateresearch and aid in such development activities, including anyactivities performed with the intention to produce information tosupport a regulatory submission.

The terms “sequencing” as used herein refers generally to any and allbiochemical methods that may be used to determine the order ofnucleotide bases including but not limited to adenine, guanine, cytosineand thymine, in one or more molecules of DNA. As used herein the term“sequence determination” means using any method of sequencing known inthe art to determine the sequence nucleotide bases in a nucleic acid.

The term “sample index” refers generally to a series of uniquenucleotides (i.e., each sample index is unique), and can be used toallow for multiplexing of samples in a single reaction vessel such thateach sample can be identified based on its sample index. In a preferredaspect, there is a unique sample index for each sample in a set ofsamples, and the samples are pooled during sequencing. For example, iftwelve samples are pooled into a single sequencing reaction, there areat least twelve unique sample indexes such that each sample is labeleduniquely. The sample index may contain additional nucleotides that allowfor identification and correction of sequencing errors including thedetection of deletion, substitution, or insertion of one or more basesduring sequencing as well as nucleotide changes that may occur outsideof sequencing such as oligo synthesis, amplification, and any otheraspect of the assay. The index may be combined with any other index tocreate one index that provides information for two properties (e.g.,sample-identification index, allele-locus index).

DETAILED DESCRIPTION OF THE INVENTION

The practice of the techniques described herein may employ, unlessotherwise indicated, conventional techniques and descriptions of organicchemistry, polymer technology, molecular biology (including recombinanttechniques), cell biology, biochemistry, and sequencing technology,which are within the skill of those who practice in the art. Suchconventional techniques include polymer array synthesis, hybridizationand ligation of polynucleotides, and detection of hybridization using alabel. Specific illustrations of suitable techniques can be had byreference to the examples herein. However, other equivalent conventionalprocedures can, of course, also be used. Such conventional techniquesand descriptions can be found in standard laboratory manuals such asGreen, et al., Eds. (1999), Genome Analysis: A Laboratory Manual Series(Vols. I-IV); Weiner, Gabriel, Stephens, Eds. (2007), Genetic Variation:A Laboratory Manual; Dieffenbach, Dveksler, Eds. (2003), PCR Primer: ALaboratory Manual; Bowtell and Sambrook (2003), DNA Microarrays: AMolecular Cloning Manual; Mount (2004), Bioinformatics: Sequence andGenome Analysis; Sambrook and Russell (2006), Condensed Protocols fromMolecular Cloning: A Laboratory Manual; and Sambrook and Russell (2002),Molecular Cloning: A Laboratory Manual (all from Cold Spring HarborLaboratory Press); Stryer, L. (1995) Biochemistry (4th Ed.) W.H.Freeman, New York N.Y.; Gait, “Oligonucleotide Synthesis: A PracticalApproach” 1984, IRL Press, London; Nelson and Cox (2000), Lehninger,Principles of Biochemistry 3^(rd) Ed., W. H. Freeman Pub., New York,N.Y.; and Berg et al. (2002) Biochemistry, 5^(th) Ed., W.H. FreemanPub., New York, N.Y., all of which are herein incorporated in theirentirety by reference for all purposes.

Note that as used herein and in the appended claims, the singular forms“a,” “an,” and “the” include plural referents unless the context clearlydictates otherwise. Thus, for example, reference to “an allele” refersto one or more copies of allele with various sequence variations, andreference to “the assay system” includes reference to equivalent stepsand methods known to those skilled in the art, and so forth.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. All publications mentionedherein are incorporated by reference for the purpose of describing anddisclosing devices, formulations and methodologies that may be used inconnection with the presently described invention.

Where a range of values is provided, it is understood that eachintervening value, between the upper and lower limit of that range andany other stated or intervening value in that stated range isencompassed within the invention. The upper and lower limits of thesesmaller ranges may independently be included in the smaller ranges, andare also encompassed within the invention, subject to any specificallyexcluded limit in the stated range. Where the stated range includes oneor both of the limits, ranges excluding either both of those includedlimits are also included in the invention.

In the following description, numerous specific details are set forth toprovide a more thorough understanding of the present invention. However,it will be apparent to one of skill in the art that the presentinvention may be practiced without one or more of these specificdetails. In other instances, well-known features and procedures wellknown to those skilled in the art have not been described in order toavoid obscuring the invention.

The Invention in General

The invention provides assay systems to identify copy number variants ofnucleic acid regions (including loci, sets of loci and larger genomicregions, e.g., chromosomes), mutations, and polymorphisms in a geneticsample and/or to select a portion of a genetic sample for re-sequencingin a genetic sample.

In one aspect, the assay system utilizes methods to selectively identifyand/or isolate selected nucleic acid regions from two or more genomicregions of interest (e.g., chromosomes or loci) in a genetic sample, andallows determination of an atypical copy number of a particular genomicregion based on the comparison between the numbers of detected nucleicacid regions from the two or more chromosomes in the genetic sample orby comparison to one or more reference chromosomes from the same or adifferent sample.

More particularly, the assay system utilizes a tandem ligation methodcomprising the use of a first and second non-adjacent oligonucleotidesof fixed sequence complementary to a selected nucleic acid region on achromosome of interest or a reference chromosome, and one or more short,bridging oligonucleotides (also called “splint” oligos) complementary tothe region between and immediately adjacent to the first and secondoligonucleotides. Hybridization of these three or more oligonucleotidesto a selected nucleic acid of interest, followed by ligation of thesethree or more oligonucleotides, provides a contiguous template forfurther amplification, detection and quantification of this region. Theamplified regions may be quantified directly from the amplificationreactions, or they are optionally isolated and identified to quantifythe number of selected nucleic acid regions in a sample.

In specific aspects, the tandem ligation methods use fixed sequenceoligonucleotides with a set of two or more contiguous, adjacent bridgingoligonucleotides that hybridize to the region of the nucleic acidbetween the region complementary to the fixed sequence oligonucleotides.These bridging oligonucleotides hybridize adjacent to one another and tothe fixed sequence oligonucleotides. The contiguous bridgingoligonucleotides are ligated during the ligation reaction with the fixedsequence oligonucleotides and with each other, resulting in a singlecontiguous template for further amplification and sequencedetermination.

In other aspects of the invention, the assay system uses a set ofoligonucleotides that bind to non-adjacent regions within a nucleic acidregion of interest, and primer extension is utilized to created acontiguous set of hybridized oligos prior to the tandem ligation step.In such aspects, the assay system utilizes a tandem ligation methodcomprising the use of first and second non-adjacent oligonucleotides offixed sequence complementary to a selected nucleic acid region on achromosome of interest or a reference chromosome, and one or more short,bridging oligonucleotides complementary to the region between the firstand second oligonucleotides but not immediately adjacent to one or theother fixed sequence oligonucleotide. Hybridization of these three ormore oligonucleotides to a selected nucleic acid of interest is followedby an extension reaction using dNTPs and a polymerase to create a set ofadjacent hybridized oligonucleotides, and ligation of the adjacenthybridized oligos. The combination of extension and ligation provides acontiguous template for further amplification, detection andquantification of this region. The amplified regions may be quantifieddirectly from the amplification reactions, or they are optionallyisolated and identified to quantify the number of selected nucleic acidregions in a sample.

In specific aspects, the tandem ligation methods use fixed sequenceoligonucleotides with a set of two or more sequential but non-adjacentbridging oligonucleotides that hybridize to the region of the nucleicacid between the region complementary to the fixed sequenceoligonucleotides. The “gap” regions between the fixed sequenceoligonucleotides and the bridging oligos and/or between the sequentialbridging oligonucleotides are ligated during the ligation reaction,resulting in a single contiguous template for further amplification andsequence determination.

In preferred aspects of the invention, the nucleic acids from thegenetic sample are associated with a substrate, e.g., using bindingpairs to attach the genetic material to a substrate surface. Briefly, afirst member of a binding pair (e.g., biotin) can be associated with anucleic acid of interest, and the associated nucleic acid attached to asubstrate comprising a second member of a binding pair (e.g., avidin orstreptavidin) on its surface. This can be particularly useful inremoving any unhybridized oligonucleotides following specific binding ofthe fixed sequence oligonucleotides and/or the bridging oligonucleotidesto the nucleic acid of interest. Briefly, the attached nucleic acids canbe hybridized to the oligonucleotides, and the surface preferablytreated to remove any unhybridized oligonucleotides, e.g., by washing orother removal methods such as degradation of such oligonucleotides asdiscussed in Willis et al., U.S. Pat. Nos. 7,700,323 and 6,858,412.

There are a number of methods that may be used in the association of anucleic acid via binding pair interactions, as will be apparent to oneskilled in the art upon reading the present specification. For example,numerous methods may be used for labeling the nucleic acids of a geneticsample with biotin, including random photobiotinylation, end-labelingwith biotin, replicating with biotinylated nucleotides, and replicatingwith a biotin-labeled primer.

In a preferred aspect, the assay system of the invention employs amultiplexed reaction with a set of three or more such oligonucleotidesfor each selected nucleic acid region. This general aspect isillustrated in FIG. 1. Each set of oligonucleotides preferably containstwo oligonucleotides 101, 103 of fixed sequence and one or more bridgingoligonucleotides 113. Each of the fixed sequence oligonucleotidescomprises a region complementary to the selected nucleic acid region105, 107, and preferably universal primer sequences 109, 111, i.e. oligoregions complementary to universal primers. These universal primersequences 109, 111 are used to amplify the different selected nucleicacid regions following ligation of the hybridized fixed sequenceoligonucleotides and the bridging oligonucleotide. The universal primersequences are located at or near the ends of the fixed sequenceoligonucleotides 101, 103, and thus preserve the nucleic acid-specificsequences in the products of any universal amplification methods.Amplification products can be detected by determination of the sequenceof the products, e.g., through sequence determination or hybridization,e.g., to an array or a bead-based detection system such as the Luminex™bead-based assay (Invitrogen, Carlsbad, Calif.) or the BeadXpress™ assay(Illumina, San Diego, Calif.).

In one aspect of the assay systems of the invention, the fixed sequenceoligonucleotides 101, 103 are introduced 102 to the genetic sample 100and allowed to specifically bind to the complementary portions of thenucleic acid region of interest 115. Following hybridization, theunhybridized fixed sequence oligonucleotides are preferably separatedfrom the remainder of the genetic sample (not shown). The bridgingoligonucleotide is then introduced and allowed to bind 104 to the regionof the selected nucleic acid region 115 between the first 101 and second103 fixed sequence oligonucleotides. Alternatively, the bridging oligocan be introduced simultaneously to the fixed sequence oligonucleotides.The bound oligonucleotides are ligated 106 to create a contiguousnucleic acid spanning and complementary to the nucleic acid region ofinterest. Following ligation, universal primers 117, 119 are introducedto amplify 108 the ligated template region to create 110 products 121that comprise the sequence of the nucleic acid region of interest. Theseproducts 121 are optionally isolated, detected, and quantified toprovide information on the presence and amount of the selected nucleicacid region in a genetic sample. Preferably, the products are detectedand quantified through sequence determination of the product, and inparticular sequence determination of the region of the productcorresponding to the selected nucleic acid region.

The number of selected nucleic acid regions analyzed for each chromosomein the assay system of the invention may vary from 2-20,000 or more perchromosome analyzed. In a preferred aspect, the number of targetedregions is between 48 and 480. In another aspect, the number of targetedregions is at least 100. In another aspect, the number of targetedregions is at least 400. In another aspect, the number of targetedregions is at least 1000.

In certain aspects, the bridging oligos can be composed of mixture ofoligos with degeneracy in each of the positions, so that the mixture ofrandomers used will be compatible with all reactions in the multiplexedassay requiring a bridging of the given length. In another aspect, thebridging oligos can be of various lengths so that the mixture of oligoswill be compatible with particular tandem ligation reactions in themultiplexed assay requiring bridging oligos of the given lengths.

In yet another aspect the bridging oligo can have partial degeneracy andthe multiplexed tandem ligation reactions are restricted to those thatrequire the specific sequences provided by the degeneracy of thebridging oligos. For example, a set of tandem ligation reactions mayrequire only A and C bases in the bridging oligo, and a mixture ofbridging oligos synthesized with only A and C bases would be providedfor these particular tandem ligation reactions in a multiplexed assay.

In yet another aspect, the bridging oligo sequences are designed suchthat only those assays that have the given specific sequences in thebridging region would be multiplexed in the assay system. In one examplethe bridging oligo is a randomer, where all combinations of the bridgingoligo are synthesized. As an example, in the case where a 5-base oligois used, the number of unique bridging oligos would be 4̂5=1024. Thiswould be independent of the number of targeted regions since allpossible bridging oligos would be present in the reaction.

In another example the bridging oligo is specific, synthesized to matchthe sequences in the gap. As an example, in the case where a 5-baseoligo is used, the number of unique oligos synthesized would be equal toor less than the number of targeted regions. A number less than thenumber of targeted regions could be achieved if the gap sequence wasshared between two or more targeted regions. In one aspect of thisexample, one might purposefully choose the targeted sequences andespecially the gap sequences such that there was as much identicaloverlap as possible in the gap sequences, minimizing the number ofbridging oligos necessary for the multiplexed reaction.

In another aspect, the sequences of the bridging oligos are designed andthe nucleic acid regions are selected so that all selected nucleic acidregions share the same base(s) at each end of the bridging oligo. Forinstance, one might choose selected nucleic acids and their gap locationsuch that all of the gaps shared an “A” base at the first position and a“G” base at the last position of the gap. Any combination of a first andlast base could be utilized, based upon factors such as the genomeinvestigated, the likelihood of sequence variation in that area, and thelike. In a specific aspect of this example, the bridging oligos can besynthesized by random degeneracy of bases at the internal positions ofthe bridging oligo, specific addition at the first and last position. Inthe case of a 5-mer, the second, third and fourth positions would berandomly provided, and two specific nucleotides would be added at theproximal positions. In this case, the number of unique bridging oligoswould be 4̂3=64.

In the human genome the frequency of the dinucleotide CG is much lowerthan expected by the respective mononucleotide frequencies. Thispresents an opportunity to enhance the specificity of an assay with aparticular mixture of bridging oligos. In this aspect, the bridgingoligos may be selected to have a 5′ G and a 3′ C. This base selectionallows each oligo to have a high frequency in the human genome but makesit a rare event for two bridging oligos to hybridize adjacent to eachother. The probability is then reduced that multiple oligos are ligatedin locations of the genome that are not targeted in the assay.

The bridging oligo is preferably added to the reaction after the fixedsequence oligonucleotides have been hybridized, and following theoptional removal of all unhybridized fixed sequence oligonucleotideshave been washed away. The conditions of the hybridization reaction arepreferably optimized near the T_(m) of the bridging oligo to preventerroneous hybridization of oligos that are not fully complementary tothe nucleic acid region. If the bridging oligos have a T_(m)significantly lower than the fixed sequence oligonucleotides, the splintoligo is preferably added as a part of the ligase reaction.

The advantage of using short oligos is that ligation on either end wouldlikely occur only when all bases of the bridging oligo match the gapsequence. A further advantage of short bridging oligos is that thenumber of different oligos necessary could be less than the number oftargeted sites, raising the oligos effective concentration to allowperfect matches to happen faster. Fewer oligos also has advantages incost and quality control. The advantages of using fixed first and lastbases with random bases in between include the ability to utilize longerbridging oligos for better specificity while reducing the number oftotal bridging oligos in the reaction.

Use of Indices in the Assay Systems of the Invention

In certain aspects, all or a portion of the nucleic acids of interestare directly detected using the described techniques. In certainaspects, however, the nucleic acids of interest are associated with oneor more indices that are identifying for a selected nucleic acid regionor a particular sample being analyzed. The detection of the one or moreindices can serve as a surrogate detection mechanism of the selectednucleic acid region, or as confirmation of the presence of a particularselected nucleic acid region if both the index and the sequence of thenucleic acid region itself are determined. These indices are preferablyassociated with the selected nucleic acids during an amplification stepusing primers that comprise both the index and sequence regions thatspecifically hybridize to the nucleic acid region.

In one example, the primers used for amplification of a selected nucleicacid region are designed to provide a locus index between the selectednucleic acid region primer region and a universal amplification region.The locus index is unique for each selected nucleic acid region andrepresentative of a locus on a chromosome of interest or referencechromosome, so that quantification of the locus index in a sampleprovides quantification data for the locus and the particular chromosomecontaining the locus.

In another aspect, the primers used for amplification of the selectednucleic acid regions to be analyzed for a genetic sample are designed toprovide a random index between the selected nucleic acid region primerregion and a universal amplification region. In such an aspect, asufficient number of identification indices are present to uniquelyidentify each selected nucleic acid region in the sample. Each nucleicacid region to be analyzed is associated with a unique identificationindex, so that the identification index is uniquely associated with theselected nucleic acid region. Quantification of the identification indexin a sample provides quantification data for the associated selectednucleic acid region and the chromosome corresponding to the selectednucleic acid region. The identification locus may also be used to detectany amplification bias that occurs downstream of the initial isolationof the selected nucleic acid regions from a sample.

In certain aspects, only the locus index and/or the identification index(if present) are detected and used to quantify the selected nucleic acidregions in a sample. In another aspect, a count of the number of timeseach locus index occurs with a unique identification index is done todetermine the relative frequency of a selected nucleic acid region in asample.

The primers are preferably designed so that indices comprisingidentifying information are coded at the ends of the primer flanking theregion complementary to the nucleic acid of interest. The indices arenon-complementary but unique sequences used within the primer to provideinformation relevant to the selective nucleic acid region that isisolated and/or amplified using the primer. The advantage of this isthat information on the presence and quantity of the selected nucleicacid region can be obtained without the need to determine the actualsequence itself, although in certain aspects it may be desirable to doso. Generally, however, the ability to identify and quantify a selectednucleic acid region through identification of one or more indices willdecrease the length of sequencing required as the loci information iscaptured at the 3′ or 5′ end of the isolated selected nucleic acidregion. Use of indices as a surrogate for identification of selectednucleic acid regions may also reduce error since longer sequencing readsare more prone to the introduction or error.

In addition to locus-specific indices and identification indices,additional indices can be introduced to primers to assist in themultiplexing of samples. In addition, indices which identify sequencingerror, which allow for highly multiplexed amplification techniques orwhich allow for hybridization or ligation or attachment to anothersurface can be added to the primers. The order and placement of theseindices, as well as the length of these indices, can vary.

The primers used for identification and quantification of a selectednucleic acid region may be associated with regions complementary to the5′ of the selected nucleic acid region, regions complementary to the 5′of the selected nucleic acid region, or in certain amplification regimesthe indices may be present on one or both of a set of amplificationprimers complementary to the selected nucleic acid region. The primerscan be used to multiplex the analysis of multiple selected nucleic acidregions to be analyzed within a sample, and can be used either insolution or on a solid substrate, e.g., on a microarray or on a bead.These primers may be used for linear replication or amplification, orthey may create circular constructs for further analysis.

Thus, in some aspects one or both of the fixed sequence oligonucleotidesfurther contain an index region. This index region may comprise a numberof different sequences that can be used to identify the selected nucleicacid region and/or the sample being analyzed in the assay system.Preferably, the index region corresponds to the selected nucleic acidregion, so that identification of the index region can be used as asurrogate for detection of the actual sequence of the selected nucleicacid region. The index region may optionally comprise a sample index tocorrespond the oligo set to a particular genetic sample in a multiplexedassay system.

FIG. 2 illustrated the use of a single index region 221 on a first fixedsequence oligonucleotide 201 in an oligo set for a selected nucleic acidregion. The fixed sequence oligonucleotides 201, 203 are introduced 202to the genetic sample 200 and allowed to specifically bind to theselected nucleic acid region 215. Following hybridization, theunhybridized fixed sequence oligonucleotides are preferably separatedfrom the remainder of the genetic sample (not shown). The bridging oligois then introduced and allowed to hybridize 204 to the region of theselected nucleic acid region 215 between the first 201 and second 203fixed sequence oligonucleotides. The bound oligonucleotides are ligated206 to create a contiguous nucleic acid spanning and complementary tothe nucleic acid region of interest. Following ligation, universalprimers 217, 219 are introduced to amplify 208 the ligated templateregion to create 210 products 223 that comprise the sequence of thenucleic acid region of interest. These products 223 are optionallyisolated, detected, and/or quantified to provide information on thepresence and amount of the selected nucleic acid region in a geneticsample. Preferably, the products are detected and quantified throughsequence determination of the index, thus obviating the need fordetermining the actual sequences of the selected nucleic acid region. Inother aspects, however, it is desirable to determine the productcomprising sequences of both the index and the selected nucleic acidregion, for example, to provide internal confirmation of the results orwhere the index provides sample information and is not informative ofthe selected nucleic acid region. In another aspect, the index permitsunique hybridization to a feature on an array, such hybridizationleading to the detection and quantification of the sequences.

The use of indices is especially useful in a multiplexed assay settingwhere two or more different selected nucleic acid regions are beingsimultaneously detected in a genetic sample. FIG. 3 illustrates anexample where two different selected nucleic acid regions are detectedin a single tandem reaction assay. Two sets of fixed sequenceoligonucleotides (301 and 303, 323 and 325) that specifically hybridizeto two different nucleic acid regions 315, 331 are introduced 302 to agenetic sample and allowed to hybridize 304 to the respective nucleicacid regions. Each set comprises an oligonucleotide 301, 323 having asequence specific region 305, 327, a universal primer region 309 and anindex region 321, 335. The other fixed sequence oligonucleotide of thesets comprise a sequence specific region 307, 329 and a universal primerregion 311. Following hybridization, the unhybridized fixed sequenceoligonucleotides are preferably separated from the remainder of thegenetic sample (not shown). The bridging oligos 313, 333 are introducedto the hybridized fixed sequence oligonucleotide/nucleic acid regionsand allowed to hybridize 306 to these regions. Although shown in FIG. 3as two different bridging oligos, in fact the same bridging oligo may besuitable for both hybridization events, or they may be two oligos from apool of degenerate oligos that are used with multiple tandem ligationevents. The bound oligonucleotides are ligated 308 to create acontiguous nucleic acid spanning and complementary to the nucleic acidregion of interest. Following ligation, universal primers 317, 319 areintroduced to amplify 310 the ligated template regions to create 312amplification products 337, 339 that comprise the sequence of thenucleic acid regions of interest. These products 337, 339 are optionallyisolated, detected and/or quantified to provide information on thepresence and amount of the selected nucleic acid region in a geneticsample.

In multiplexed assay systems, the products are detected and quantifiedthrough sequence determination of the different indices, thus obviatingthe need for determining the actual sequences of the selected nucleicacid region. In other aspects, however, the index may be a samplespecific index as well as a region specific index, and thus the indexmay not only identify the nucleic acid region, but it may also provideinformation of the nucleic acid region and the genetic sample from whichthe region was obtained. Alternatively, the nucleic acid region of theproduct may be detected, for example, to provide internal confirmationof the results or where the index provides solely sample information andis not informative of the selected nucleic acid region.

Detection of Polymorphic Regions Using the Ligation-Based Assay System

In certain aspects, the assay system of the invention detects one ormore regions that comprises a polymorphism. This methodology is notprimarily designed to identify a particular allele, e.g., as maternalversus fetal, but rather to ensure that different alleles correspondingto a nucleic acid region of interest are included in the quantificationmethods of the invention. In certain aspects, however, it may bedesirable to both use the information to count all such nucleic acidregions in the genetic sample as well as to use the information onspecific polymorphisms, e.g., to calculate the amount of fetal DNAcontained within a maternal sample, or identify the % alleles with aparticular mutation in a genetic sample from a cancer patient. Thus, theinvention is intended to encompass both mechanisms for detection ofSNP-containing nucleic acid regions for direct determination of copynumber variant through quantification as well as detection of SNP forensuring overall efficiency of the assay.

Thus, in a particular aspect of the invention, allele-discrimination isprovided through the bridging oligo. In this aspect, the bridging oligois located over a SNP. In this aspect, the polymorphism is preferablylocated close enough to one end of a ligation reaction as to provideallele-specificity.

In one example of allele detection, both complementary allele bridgingoligo variants are present in the same reaction mixture and alleledetection results from subsequent sequencing through the polymorphism ofthe ligated products or their amplification products. FIG. 4 illustratesthis aspect.

In FIG. 4, two fixed sequence oligonucleotides 401, 403 and bridgingoligonucleotides corresponding to the two possible SNPs in the nucleicacid regions of interest 415, 429 are used in detection of the selectednucleic acid region, and preferably to detect the region in a singlereaction. Each of the fixed sequence oligonucleotides comprises a regioncomplementary to the selected nucleic acid region 405, 407, anduniversal primer sequences 409, 411 used to amplify the differentselected nucleic acid regions following initial selection and/orisolation of the selected nucleic acid regions from the genetic sample.The universal primer sequences are located at the proximal ends of thefixed sequence oligonucleotides 401, 403, and thus preserve the nucleicacid-specific sequences in the products of any universal amplificationmethods. The fixed sequence oligonucleotides 401, 403 are introduced 402to the genetic sample 400 and allowed to specifically bind to theselected nucleic acid region 415, 429. Following hybridization, theunhybridized fixed sequence oligonucleotides are preferably separatedfrom the remainder of the genetic sample (not shown). The bridgingoligos corresponding to an A/T SNP 413 or a G/C SNP 433 are introducedand allowed to bind 404 to the region of the selected nucleic acidregion 415, 429 between the first 401 and second 403 fixed sequenceoligonucleotides. Alternatively, the bridging oligos 413, 433 can beintroduced to the sample simultaneously with the fixed sequenceoligonucleotides.

The bound oligonucleotides are ligated 406 to create a contiguousnucleic acid spanning and complementary to the nucleic acid region ofinterest. Following ligation, universal primers 417, 419 are introducedto amplify 408 the ligated template region to create 410 products 421,423 that comprise the sequence of the nucleic acid region of interestrepresenting both SNPs in the selected nucleic acid region. Theseproducts 421, 423 are detected and quantified through sequencedetermination of the product, and in particular the region of theproduct containing the SNP in the selected nucleic acid region.

In another example, the allele detection results from the sequencing ofa locus index or an allele index which is provided in one or both of thefixed sequence nucleic acid region oligonucleotides. The locus indexand/or allele index is embedded in either the first or second fixedsequence oligonucleotide used in the set for a selected nucleic acidregion containing a polymorphism, and is used with either a specificfixed sequence oligo or with a particular bridging oligo, either ofwhich may be designed to detect the polymorphism. Detection of the locusindex and/or the allele index in an amplification product allowsdetection of the presence, amount or absence of a specific allelepresent in a genetic sample, as well as the number of counts for theregion through addition of the polymorphic regions detected in thesample. Two examples of how this may be performed are described in moredetail below.

For example, in one aspect of the invention, two or more separatereactions are carried out using a single locus index and differentbridging oligos corresponding to the different polymorphisms in theregion complementary to the bridging oligos. The reactions aredifferentiated by the bridging oligo, and the ligation, amplificationand detection reactions comprising the different bridging oligos remainseparate through the detection step. The total counts for a particularnucleic acid region of interest can be determined mathematically usingthe locus index by adding the detected numbers of the counts for thenucleic acid region from the separate reactions comprising the bridgingoligos having different polymorphic sequences.

This aspect may be useful for, e.g., circumstances in which bothinformation on polymorphic frequency in a sample and information ontotal loci counts are desirable. Since the reactions are detectedseparately, only one index may be needed for detection in each of theseparate reactions, although separate allele indices may also be used inthe separate reactions.

FIG. 5 illustrates one such aspect of the assay system of the invention.Two fixed sequence oligonucleotides 501, 503 and bridgingoligonucleotides corresponding to the two possible SNPs in the selectednucleic acid region 515, 525 are used in detection of a nucleic acidregion of interest. Each of the fixed sequence oligonucleotidescomprises a region complementary to the selected nucleic acid region.The ligation, amplification, and detection steps of the assay systemtake place in two separate reactions, with a first reaction utilizing afirst bridging oligo 513 and the second reaction utilizing a secondbridging oligo 533. Both reactions utilize the same fixed sequenceoligos 501, 503 having the same regions complementary to allele-specificregions 505, 507. A single locus index 521 can be used to detect theamplification products in each reaction so that sequence determinationof the actual sequence of the nucleic acids of interest are notnecessarily needed, although they may still be determined to identify orprovide confirmation of the sequence. The universal primer sequences509, 511 are located at either end flanking the fixed sequenceoligonucleotides 501, 503, and thus preserve the nucleic acid-specificsequences and the indices in the products of any universal amplificationmethods. The fixed sequence oligonucleotides 501, 503 are introduced 502to the genetic sample 500 and allowed to specifically bind to theselected nucleic acid region 515, 525. Following hybridization, theunhybridized fixed sequence oligonucleotides are preferably separatedfrom the remainder of the genetic sample (not shown). The bridgingoligos corresponding to an A/T SNP 513 or a G/C SNP 533 are introducedto each reaction and allowed to bind 504 to the region of the selectednucleic acid region 515, 525 between the first 505 and second 507 fixedsequence oligonucleotides. Alternatively, the bridging oligos 513, 533can be introduced to the sample simultaneously with the fixed sequenceoligonucleotides.

The bound oligonucleotides are ligated 506 to create a contiguousnucleic acid spanning and complementary to the nucleic acid region ofinterest. Following ligation, universal primers 517, 519 are introducedto amplify 508 the ligated template region to create 510 products 527,529 that comprise the sequence of the nucleic acid region of interestrepresenting both SNPs in the selected nucleic acid region. Theseproducts 527, 529 are detected and quantified through sequencedetermination of the product, and in particular the locus index combinedwith the knowledge of which bridging oligo was added to which reaction.The counts for the nucleic acid region as a whole can be determinedthrough addition of the detected polymorphic regions in the geneticsamples.

A different specific aspect of the invention utilizes allele indices toindentify alleles comprising different polymorphisms as well as todetermine counts of the nucleic acid region of interest. In amultiplexed reaction, locus indices may be combined with allele indices.In this aspect, two or more separate ligation reactions are carried outusing two or more different bridging oligos corresponding to thedifferent polymorphisms in the region complementary to the bridgingoligos. The reactions are differentiated by the bridging oligo, and eachbridging oligo is used with a fixed sequence oligo comprising an alleleindex that identifies that particular bridging oligo. Following theligation step, the reactions can be combined either prior toamplification, since the same universal primers are preferably used, orprior to detection, as the different alleles can be distinguishedthrough identification of the different allele-specific indices. Theallele may also be distinguished through sequence determination of theallele index or alternatively from hybridizing of the allele index, andtotal counts for the nucleic acid region can be determined through theaddition of the identified allelic regions.

In FIG. 6, two fixed sets of sequence oligonucleotides are used whichcomprise substantially the same sequence-specific regions 605, 607 butwhich comprise different indices, 621, 623 on one of the fixed sequenceoligonucleotides of the set. The ligation reactions are carried out withmaterial from the same genetic sample 600, but in separate tubes withthe different allele-specific oligo sets. The bridging oligonucleotidescorresponding to the two possible SNPs in the selected nucleic acidregion 613, 633 are used in detection of the selected nucleic acidregion in each ligation reaction. Two allele indices 621, 623 that areindicative of the particular polymorphic alleles can be used to detectthe amplification products so that sequence determination of the actualsequence of the nucleic acids of interest are not necessarily needed,although these sequences may still be determined to identify and/orprovide confirmation of the sequence. Each of the fixed sequenceoligonucleotides comprises a region complementary to the selectednucleic acid region 605, 607, and universal primer sequences 609, 611used to amplify the different selected nucleic acid regions followinginitial selection and/or isolation of the selected nucleic acid regionsfrom the genetic sample. The universal primer sequences are located atthe ends of the fixed sequence oligonucleotides 601, 603, and 623flanking the indices and the regions complementary to the nucleic acidof interest, thus preserving the nucleic acid-specific sequences and theallele indices in the products of any universal amplification methods.The fixed sequence oligonucleotides 601, 603, 623 are introduced 602 toan aliquot of the genetic sample 600 and allowed to specifically bind tothe selected nucleic acid regions 615 or 625. Following hybridization,the unhybridized fixed sequence oligonucleotides are preferablyseparated from the remainder of the genetic sample (not shown).

The bridging oligos corresponding to an A/T SNP 613 or a G/C SNP 633 areintroduced and allowed to bind 604 to the region of the selected nucleicacid region 615 or 625 between the first 605 and second 607 nucleicacid-complementary regions of the fixed sequence oligonucleotides.Alternatively, the bridging oligos 613, 633 can be introduced to thesample simultaneously with the fixed sequence oligonucleotides. Thebound oligonucleotides are ligated 606 in the single reaction mixture tocreate a contiguous nucleic acid spanning and complementary to thenucleic acid region of interest.

Following ligation, the separate reactions are preferably combined forthe universal amplification and detection steps. Universal primers 617,619 are introduced to the combined reactions to amplify 608 the ligatedtemplate regions and create 610 products 627, 629 that comprise thesequence of the nucleic acid region of interest representing both SNPsin the selected nucleic acid region. These products 627, 629 aredetected and quantified through sequence determination of the product,through the allele index and/or the region of the product containing theSNP in the selected nucleic acid region.

Preferably, the products of the FIG. 6 methods are detected andquantified through sequence determination of the allele indices, thusobviating the need for determining the actual sequences of the selectednucleic acid region. In other aspects, however, it is desirable todetermine the product comprising sequences of both the index and theselected nucleic acid region, for example, to provide internalconfirmation of the results or where the index provides sampleinformation and is not informative of the selected nucleic acid region.

The indices used with the assay systems of the invention can also beused to identify polymorphisms that are associated with the fixedsequences used for the detection of nucleic acids of interest. Thus, inanother exemplary assay system, an allele index is associated with anallele-specific fixed sequence oligonucleotide, and the allele detectionresults from the sequencing of an allele index or alternatively fromhybridizing of an allele index which is provided in the nucleic acidregion primer. The allele index is embedded in either theallele-specific first or second fixed sequence oligonucleotide used inthe set for a selected nucleic acid region containing a polymorphism. Inspecific aspects, an allele index is present on both the first andsecond fixed sequence oligonucleotides to detect two or morepolymorphisms within the fixed sequence regions. The number of fixedsequence oligonucleotides used in such aspects can corresponds to thenumber of possible alleles being assessed for a selected nucleic acidregion, and sequence determination or hybridization of the allele indexcan detect presence, amount or absence of a specific allele is a geneticsample.

FIG. 7 illustrates this aspect of the invention. In FIG. 7, three fixedsequence oligonucleotides 701, 703 and 723 are used. Two of the fixedsequence oligonucleotides 701, 723 are allele-specific, comprising aregion complementary to an allele in a nucleic acid region comprisingfor example an A/T or G/C SNP, respectively. Each fixed allele-specificoligonucleotides 701, 723 also comprises a corresponding allele index721, 731 and a universal primer sequence 709. The second fixed sequenceoligonucleotide 703 has another universal primer sequence 711, and theseuniversal primer sequences are used to amplify the \nucleic acid regionsfollowing initial selection and/or isolation of the nucleic acid regionsfrom the genetic sample. The universal primer sequences are located atthe ends of the fixed sequence oligonucleotides 701, 703, 723 flankingthe indices and the nucleic acid regions of interest, and thus preservethe nucleic acid-specific sequences and the indices in the products ofany universal amplification methods.

The fixed sequence oligonucleotides 701, 703, 723 are introduced 702 tothe DNA sample 700 and allowed to specifically bind to the selectednucleic acid region 715, 725. Following hybridization, the unhybridizedfixed sequence oligonucleotides are preferably separated from theremainder of the genetic sample (not shown). The bridging oligos 713 areintroduced and allowed to bind 704 to the nucleic acid 715 complementaryto the region between the first allele-specific fixed sequenceoligonucleotide region 705 and the other fixed sequence oligonucleotideregion 707 or to the nucleic acid 725 complementary to the regionbetween the second allele-specific fixed sequence oligonucleotide region735 and the other fixed sequence oligonucleotide region 707.Alternatively, the bridging oligos 713 can be introduced to the samplesimultaneously with the sets of fixed sequence oligonucleotides.

The bound oligonucleotides are ligated 706 to create a contiguousnucleic acid spanning and complementary to the nucleic acid region ofinterest. The ligation primarily occurs only when the allele-specificends match. Following ligation, universal primers 717, 719 areintroduced to amplify 708 the ligated template region to create 710products 727, 729 that comprise the sequence of the nucleic acid regionof interest representing both SNPs in the selected nucleic acid region.These products 727, 729 are detected and quantified through sequencedetermination of the product, and in particular the region of theproduct containing the SNP in the selected nucleic acid region.Alternatively the products 727, 729 are detected and quantified throughhybridization of the allele index to different features on an array. Inthis detection method, a fluorescent label is incorporated into theproducts 727, 729 during the universal amplification by amplifying withprimers 717 or 719 that are fluorescently labeled. It is important tonote that the ligation 706 is allele-specific. In order to make theligation allele-specific, the allele specifying nucleotide must be closeto the ligated end. Typically, the allele-specific nucleotide must bewithin 5 nucleotides of the ligated end. In a preferred aspect, theallele-specific nucleotide is the terminal base.

In another example, the allele detection results from the hybridizationof a locus index to an array. Each allele is detected through anallele-specific labeling step, where each allele is labeled with aspectrally distinct fluorescent label during the universalamplification. FIG. 8 illustrates this aspect of the invention. In FIG.8, three fixed sequence oligonucleotides 801, 803 and 823 are used. Twoof the fixed sequence oligonucleotides 801, 823 are allele-specificcomprising a region matching a particular allele in the same selectednucleic acid region, a corresponding locus index 821 and allele-specificuniversal primer sequences 809, 839. The matching fixed sequenceoligonucleotide 803 has another universal primer sequence 811. Theuniversal primer sequences are used to amplify the different selectednucleic acid regions following initial selection and/or isolation of theselected nucleic acid regions from the genetic sample and incorporate alabel into the amplification products that distinguish each allele. Theuniversal primer sequences are located at the proximal ends of the fixedsequence oligonucleotides 801, 803, 823 and thus preserve the nucleicacid-specific sequences and the indices in the products of any universalamplification methods. The fixed sequence oligonucleotides 801, 803, 823are introduced 802 to the DNA sample 800 and allowed to specificallybind to the selected nucleic acid region 815, 825. Followinghybridization, the unhybridized fixed sequence oligonucleotides arepreferably separated from the remainder of the genetic sample (notshown). The bridging oligos 813 are introduced and allowed to bind 804to the region of the selected nucleic acid region 815, 825 between thefirst 805 and second 807 fixed sequence oligonucleotides and between thefirst 835 and second 807 fixed sequence oligonucleotides. Alternatively,the bridging oligos 813 can be introduced to the sample simultaneouslywith the fixed sequence oligonucleotides.

The bound oligonucleotides are ligated 806 to create a contiguousnucleic acid spanning and complementary to the nucleic acid region ofinterest. The ligation primarily occurs only when the allele-specificends match. Following ligation, universal primers 817, 819, 837 areintroduced to amplify 808 the ligated template region to create 810products 827, 829 that comprise the sequence of the nucleic acid regionof interest representing both SNPs in the selected nucleic acid region.The universal primers 817 and 837 have spectrally distinct fluorescentlabels such that the allele-specific information is retained throughthese fluorescent labels. These products 827, 829 are detected andquantified through hybridization of the locus index 821 to an array andimaging to determine the incorporation of the fluorescent label. It isimportant to note that the ligation 806 is preferably allele-specific.In order to make the ligation allele-specific, the allele specifyingnucleotide must be close to the ligated end. Typically, theallele-specific nucleotide must be within 5 nucleotides of the ligatedend. In a preferred aspect, the allele-specific nucleotide is theterminal base.

In another aspect, an allele index is present on both the first andsecond fixed sequence oligonucleotides to detect a polymorphism at bothends with a corresponding spectrally distinct fluorescent label for eachfixed sequence oligonucleotide for a given allele. The number of fixedsequence oligonucleotides corresponds to the number of possible allelesbeing assessed for a selected nucleic acid region. In the above figuresand examples, the fixed sequence oligonucleotides are represented as twodistinct oligonucleotides. In another aspect, the fixed sequenceoligonucleotides may be opposite ends of the same oligonucleotide.

In the aspects described above, the bridging oligos used hybridize toregions of the nucleic acid of interest that are adjacent to the regionscomplementary to the fixed sequence oligonucleotides, so that when thefixed sequence and bridging oligo(s) specifically hybridize they aredirectly adjacent to one another for ligation. In other aspects,however, the bridging oligo hybridizes to a region that is not directlyadjacent to the region complementary to one or both of the fixedsequence oligos, and an intermediate step requiring extension of one ormore of the oligos is necessary prior to ligation.

For example, as illustrated in FIG. 9, each set of oligonucleotidespreferably contains two oligonucleotides 901, 903 of fixed sequence andone or more bridging oligonucleotides 913. Each of the fixed sequenceoligonucleotides comprises a region complementary to the selectednucleic acid region 905, 907, and preferably universal primer sequences909, 911, i.e. oligo regions complementary to universal primers. Theuniversal primer sequences 909, 911 are located at or near the ends ofthe fixed sequence oligonucleotides 901, 903, and thus preserve thenucleic acid-specific sequences in the products of any universalamplification methods. The fixed sequence oligonucleotides 901, 903 areintroduced 902 to the genetic sample 900 and allowed to specificallybind to the complementary portions of the nucleic acid region ofinterest 915. Following hybridization, the unhybridized fixed sequenceoligonucleotides are preferably separated from the remainder of thegenetic sample (not shown). The bridging oligonucleotide is thenintroduced and allowed to bind 904 to the region of the selected nucleicacid region 915 between the first 901 and second 903 fixed sequenceoligonucleotides. Alternatively, the bridging oligo can be introducedsimultaneously to the fixed sequence oligonucleotides. In this exemplaryaspect, the bridging oligo hybridizes to a region directly adjacent tothe first fixed sequence oligo region 905, but is separated by one ormore nucleotides from the complementary region of the second fixedsequence oligonucleotide 907. Following hybridization of the fixedsequence and bridging oligos, the bridging oligo 913 is extended 906,e.g., using a polymerase and dNTPs, to fill the gap between the bridgingoligo 913 and the second fixed sequence oligo 903. Following extension,the bound oligonucleotides are ligated 908 to create a contiguousnucleic acid spanning and complementary to the nucleic acid region ofinterest 915. After ligation, universal primers 917, 919 are introduced910 to amplify the ligated template region to create 912 products 923that comprise the sequence of the nucleic acid region of interest. Theseproducts 923 are optionally isolated, detected, and quantified toprovide information on the presence and amount of the selected nucleicacid region in a genetic sample. Preferably, the products are detectedand quantified through sequence determination of an identification index921, or, alternatively, sequence determination of the nucleic acid ofinterest 915 within the amplification product 923.

In another aspect, as illustrated in FIG. 10, each set ofoligonucleotides preferably contains two oligonucleotides 1001, 1003 offixed sequence and two or more bridging oligonucleotides 1013, 1033 thatbind to non-adjacent regions on a nucleic acid of interest 1015. Each ofthe fixed sequence oligonucleotides comprises a region complementary tothe selected nucleic acid region 1005, 1007, and preferably universalprimer sequences 1009, 1011, i.e. oligo regions complementary touniversal primers. The universal primer sequences 1009, 1011 are locatedat or near the ends of the fixed sequence oligonucleotides 1001, 1003,and thus preserve the nucleic acid-specific sequences in the products ofany universal amplification methods. The fixed sequence oligonucleotides1001, 1003 are introduced 1002 to the genetic sample 1000 and allowed tospecifically bind to the complementary portions of the nucleic acidregion of interest 1015. Following hybridization, the unhybridized fixedsequence oligonucleotides are preferably separated from the remainder ofthe genetic sample (not shown).

In FIG. 10, two separate bridging oligonucleotides 1013, 1033 areintroduced and allowed to bind 1004 to the region of the selectednucleic acid region 1015 between but not immediately adjacent to boththe first 1001 and second 1003 fixed sequence oligonucleotides.Alternatively, the bridging oligo can be introduced simultaneously tothe fixed sequence oligonucleotides. In this exemplary aspect, the firstbridging oligo 1033 hybridizes to a region directly adjacent to thefirst fixed sequence oligo region 1005, but is separated by one or morenucleotides from the complementary region of the second bridging oligo1013. The second bridging oligo 1013 is also separated from the secondfixed sequence oligonucleotide 1007 by one or more nucleotides.Following hybridization of the fixed sequence and bridging oligos, bothbridging oligos 1013, 1033 are extended 1006, e.g., using a polymeraseand dNTPs, to fill the gap between the bridging oligos and the gapbetween the second bridging oligo 1013 and the second fixed sequenceoligo 1003. Following extension, the bound oligonucleotides are ligated1008 to create a contiguous nucleic acid spanning and complementary tothe nucleic acid region of interest 1015. Following ligation, universalprimers 1017, 1019 are introduced 910 to amplify the ligated templateregion to create 1012 products 1023 that comprise the sequence of thenucleic acid region of interest. These products 1023 are optionallyisolated, detected, and quantified to provide information on thepresence and amount of the selected nucleic acid region in a geneticsample. Preferably, the products are detected and quantified throughsequence determination of an identification index 1021, or,alternatively, sequence determination of the nucleic acid of interest1015 within the amplification product 1023.

In specific aspects, such as the aspect illustrated in FIG. 11, thesingle fixed sequence oligonucleotide 1101 is complementary to theselected nucleic acid region 1115 on both ends. When this single fixedsequence oligonucleotide 1101 hybridizes to the selected nucleic acidregion 1115, it forms a pre-circle oligonucleotide 1103 where the endsare separated by several nucleotides. The bridging oligonucleotide 1113then binds between the complementary regions 1105, 1107 of thepre-circle oligonucleotide 1103 to fill this gap. The oligonucleotideregions 1105, 1107 of the pre-circle oligonucleotide 1103 bound to thegenetic sample 1115 are then ligated together with the bridgingoligonucleotide 1113, forming a complete circle.

The circular template is then preferably cleaved, and amplified usingone or more of the universal primer sites. In specific aspects, a singleuniversal primer region is used to replicate the template usingtechniques such as rolling circle replication, as disclosed in Lizardiet al., U.S. Pat. No. 6,558,928. In a preferred aspect, as illustratedin FIG. 11 this fixed sequence oligonucleotide has two universal primingsites 1109, 1111 on the circular template and optionally one or moreindices 1121 between the ends that are complementary to the selectednucleic acid region. Preferably, a cleavage site 1123 exists between thetwo universal priming sites. Once circularized through ligation to thebridging oligo 1113, a nuclease can be used to remove all or mostuncircularized oligonucleotides. After the removal of the uncircularizedoligonucleotides, the circularized oligonucleotide is cleaved 1106,preserving and in some aspects exposing the universal priming sites1109, 1111. Universal primers 1117, 1119 are added 1108 and a universalamplification occurs 1110 to create 1112 products 1125 that comprise thesequence of the nucleic acid region of interest. The products 1125 aredetected and quantified through sequence determination of selectednucleic acid region or alternatively the index, which obviates the needfor determining the actual sequences of the selected nucleic acidregion. In other aspects, however, it is desirable to determine theproduct comprising sequences of both the index and the selected nucleicacid region, for example, to provide internal confirmation of theresults or where the index provides sample information and is notinformative of the selected nucleic acid region. As mentioned above,this single fixed sequence oligonucleotide methodology may be applied toany of the examples in FIGS. 1-10.

Resequencing

In a particular aspect, the assay system of the invention can be used toresequence a complex nucleic acid. The tandem ligation methods have beenfound to be exceptionally efficient, and this high efficiency allows themethodology to be expanded to the use of multiple oligos, preferably2-100 or even more, that bind to nucleic acid regions of interest.

In the preferred aspect, the bridging oligos would be short, preferablybetween 1-10, more preferably between 2-7, even more preferably between3-5 nucleotides in length, and the number of bridging oligos used in atandem ligation reaction would be approximately 10-50. In a preferredaspect, the bridging oligos would be 5 bases in length and there wouldbe approximately 15-30 ligations.

In one example, the bridging oligos might be selected to providedegeneracy for all possible sequence variants for the particular oligolength, for instance all sequence variations of 5-mers. Following themultiple ligations, one the ligated oligos can be amplified using theuniversal amplification techniques described herein, and sequencedetermination of the amplified products to identify the underlyingsequence. This multiple ligation assay would provide the ability totarget multiple sections of the genome simultaneously throughuniversally amplification of tandem ligation products, and determinationof their nucleotide composition.

Universal Amplification

In preferred aspects of the invention, universal amplification is usedto amplify the ligation products created following hybridization of thefixed sequence oligonucleotides and the bridging oligonucleotides. In amultiplexed assay system, this is preferably done through universalamplification of the various nucleic acid regions to be analyzed usingthe assay systems of the invention. Universal primer sequences are addedto the contiguous ligation products so that they may be amplified in asingle universal amplification reaction. These universal primersequences are preferably introduced in the fixed sequenceoligonucleotides, although they may also be added to the proximal endsof the contiguous ligation products following ligation. The introductionof universal primer regions to the fixed sequence oligonucleotidesallows a subsequent controlled universal amplification of all or aportion of selected nucleic acids prior to or during analysis, e.g.sequence determination.

Bias and variability can be introduced during DNA amplification, such asthat seen during polymerase chain reaction (PCR). In cases where anamplification reaction is multiplexed, there is the potential that lociwill amplify at different rates or efficiency. Part of this may be dueto the variety of primers in a multiplex reaction with some havingbetter efficiency (i.e. hybridization) than others, or some workingbetter in specific experimental conditions due to the base composition.Each set of primers for a given locus may behave differently based onsequence context of the primer and template DNA, buffer conditions, andother conditions.

The whole tandem ligation reaction or an aliquot of the tandem ligationreaction may be used for the universal amplification. Using an aliquotallows different amplification reactions to be undertaken using the sameor different conditions (e.g., polymerase, buffers, and the like), e.g.,to ensure that bias is not inadvertently introduced due to experimentalconditions. In addition, variations in primer concentrations may be usedto effectively limit the number of sequence specific amplificationcycles.

In certain aspects, the universal primer regions of the primers oradapters used in the assay system are designed to be compatible withconventional multiplexed assay methods that utilize general primingmechanisms to analyze large numbers of nucleic acids simultaneously.Such “universal” priming methods allow for efficient, high volumeanalysis of the quantity of nucleic acid regions present in a geneticsample, and allow for comprehensive quantification of the presence ofnucleic acid regions within such a genetic sample for the determinationof aneuploidy.

Examples of such assay methods include, but are not limited to,multiplexing methods used to amplify and/or genotype a variety ofsamples simultaneously, such as those described in Oliphant et al., U.S.Pat. No. 7,582,420

Some aspects utilize coupled reactions for multiplex detection ofnucleic acid sequences where oligonucleotides from an early phase ofeach process contain sequences which may be used by oligonucleotidesfrom a later phase of the process. Exemplary processes for amplifyingand/or detecting nucleic acids in samples can be used, alone or incombination, including but not limited to the methods described below,each of which are incorporated by reference in their entirety.

In certain aspects, the assay system of the invention utilizes one ofthe following combined selective and universal amplification techniques:(1) LDR coupled to PCR; (2) primary PCR coupled to secondary PCR coupledto LDR; and (3) primary PCR coupled to secondary PCR. Each of theseaspects of the invention has particular applicability in detectingcertain nucleic acid characteristics. However, each requires the use ofcoupled reactions for multiplex detection of nucleic acid sequencedifferences where oligonucleotides from an early phase of each processcontain sequences which may be used by oligonucleotides from a laterphase of the process.

Barany et al., U.S. Pat. Nos. 6,852,487, 6,797,470, 6,576,453,6,534,293, 6,506,594, 6,312,892, 6,268,148, 6,054,564, 6,027,889,5,830,711, 5,494,810, describe the use of the ligase chain reaction(LCR) assay for the detection of specific sequences of nucleotides in avariety of nucleic acid samples.

Barany et al., U.S. Pat. Nos. 7,807,431, 7,455,965, 7,429,453,7,364,858, 7,358,048, 7,332,285, 7,320,865, 7,312,039, 7,244,831,7,198,894, 7,166,434, 7,097,980, 7,083,917, 7,014,994, 6,949,370,6,852,487, 6,797,470, 6,576,453, 6,534,293, 6,506,594, 6,312,892, and6,268,148 describe the use of the ligase detection reaction withdetection reaction (“LDR”) coupled with polymerase chain reaction(“PCR”) for nucleic acid detection.

Barany et al., U.S. Pat. Nos. 7,556,924 and 6,858,412, describe the useof padlock probes (also called “precircle probes” or “multi-inversionprobes”) with coupled ligase detection reaction (“LDR”) and polymerasechain reaction (“PCR”) for nucleic acid detection.

Barany et al., U.S. Pat. Nos. 7,807,431, 7,709,201, and 7,198, 814describe the use of combined endonuclease cleavage and ligationreactions for the detection of nucleic acid sequences.

Willis et al., U.S. Pat. Nos. 7,700,323 and 6,858,412, describe the useof precircle probes in multiplexed nucleic acid amplification, detectionand genotyping, including

Ronaghi et al., U.S. Pat. No. 7,622,281 describes amplificationtechniques for labeling and amplifying a nucleic acid using an adaptercomprising a unique primer and a barcode.

In addition to the various amplification techniques, numerous methods ofsequence determination are compatible with the assay systems of theinventions. Preferably, such methods include “next generation” methodsof sequencing. Exemplary methods for sequence determination include, butare not limited to, including, but not limited to, hybridization-basedmethods, such as disclosed in Drmanac, U.S. Pat. Nos. 6,864,052;6,309,824; and 6,401,267; and Drmanac et al, U.S. patent publication2005/0191656, which are incorporated by reference, sequencing bysynthesis methods, e.g., Nyren et al, U.S. Pat. Nos. 7,648,824,7,459,311 and 6,210,891; Balasubramanian, U.S. Pat. Nos. 7,232,656 and6,833,246; Quake, U.S. Pat. No. 6,911,345; Li et al, Proc. Natl. Acad.Sci., 100: 414-419 (2003); pyrophosphate sequencing as described inRonaghi et al., U.S. Pat. Nos. 7,648,824, 7,459,311, 6,828,100, and6,210,891;, and ligation-based sequencing determination methods, e.g.,Drmanac et al., U.S. Pat. Appln No. 20100105052, and Church et al, U.S.Pat. Appln Nos. 20070207482 and 20090018024.

Alternatively, nucleic acid regions of interest can be selected and/oridentified using hybridization techniques. Methods for conductingpolynucleotide hybridization assays for detection of have been welldeveloped in the art. Hybridization assay procedures and conditions willvary depending on the application and are selected in accordance withthe general binding methods known including those referred to in:Maniatis et al. Molecular Cloning: A Laboratory Manual (2^(nd) Ed. ColdSpring Harbor, N.Y., 1989); Berger and Kimmel Methods in Enzymology,Vol. 152, Guide to Molecular Cloning Techniques (Academic Press, Inc.,San Diego, Calif., 1987); Young and Davis, P.N.A.S, 80: 1194 (1983).Methods and apparatus for carrying out repeated and controlledhybridization reactions have been described in U.S. Pat. Nos. 5,871,928,5,874,219, 6,045,996 and 6,386,749, 6,391,623 each of which areincorporated herein by reference

The present invention also contemplates signal detection ofhybridization between ligands in certain preferred aspects. See U.S.Pat. Nos. 5,143,854, 5,578,832; 5,631,734; 5,834,758; 5,936,324;5,981,956; 6,025,601; 6,141,096; 6,185,030; 6,201,639; 6,218,803; and6,225,625, in U.S. Patent application 60/364,731 and in PCT ApplicationPCT/US99/06097 (published as WO99/47964), each of which also is herebyincorporated by reference in its entirety for all purposes.

Methods and apparatus for signal detection and processing of intensitydata are disclosed in, for example, U.S. Pat. Nos. 5,143,854, 5,547,839,5,578,832, 5,631,734, 5,800,992, 5,834,758; 5,856,092, 5,902,723,5,936,324, 5,981,956, 6,025,601, 6,090,555, 6,141,096, 6,185,030,6,201,639; 6,218,803; and 6,225,625, in U.S. Patent application60/364,731 and in PCT Application PCT/US99/06097 (published asWO99/47964), each of which also is hereby incorporated by reference inits entirety for all purposes.

Use of Indices in the Assay Systems of the Invention

In certain aspects, all or a portion of the sequences of the nucleicacids of interest are directly detected using the described techniques,e.g., sequence determination or hybridization. In certain aspects,however, the nucleic acids of interest are associated with one or moreindices that are identifying for a selected nucleic acid region or aparticular sample being analyzed. The detection of the one or moreindices can serve as a surrogate detection mechanism of the selectednucleic acid region, or as confirmation of the presence of a particularselected nucleic acid region if both the sequence of the index and thesequence of the nucleic acid region itself are determined. These indicesare preferably associated with the selected nucleic acids during anamplification step using primers that comprise both the index andsequence regions that specifically hybridize to the nucleic acid region.

In one example, the primers used for amplification of a selected nucleicacid region are designed to provide a locus index between the selectednucleic acid region primer region and a universal amplification region.The locus index is unique for each selected nucleic acid region andrepresentative of a locus on a chromosome of interest or referencechromosome, so that quantification of the locus index in a sampleprovides quantification data for the locus and the particular chromosomecontaining the locus.

In another example, the primers used for amplification of a selectednucleic acid region are designed to provide an allele index between theselected nucleic acid region primer region and a universal amplificationregion. The allele index is unique for particular alleles of a selectednucleic acid region and representative of a locus variation present on achromosome of interest or reference chromosome, so that quantificationof the allele index in a sample provides quantification data for theallele and the summation of the allelic indices for a particular locusprovides quantification data for both the locus and the particularchromosome containing the locus.

In another aspect, the primers used for amplification of the selectednucleic acid regions to be analyzed for a genetic sample are designed toprovide an identification index between the selected nucleic acid regionprimer region and a universal amplification region. In such an aspect, asufficient number of identification indices are present to uniquelyidentify each selected nucleic acid region in the sample. Each nucleicacid region to be analyzed is associated with a unique identificationindex, so that the identification index is uniquely associated with theselected nucleic acid region. Quantification of the identification indexin a sample provides quantification data for the associated selectednucleic acid region and the chromosome corresponding to the selectednucleic acid region. The identification locus may also be used to detectany amplification bias that occurs downstream of the initial isolationof the selected nucleic acid regions from a sample.

In certain aspects, only the locus index and/or the identification index(if present) are detected and used to quantify the selected nucleic acidregions in a sample. In another aspect, a count of the number of timeseach locus index occurs with a unique identification index is done todetermine the relative frequency of a selected nucleic acid region in asample.

In some aspects, indices representative of the sample from which anucleic acid is isolated are used to identify the source of the nucleicacid in a multiplexed assay system. In such aspects, the nucleic acidsare uniquely identified with the sample index. Those uniquely identifiedoligonucleotides may then be combined into a single reaction vessel withnucleic acids from other samples prior to sequencing. The sequencingdata is first segregated by each unique sample index prior todetermining the frequency of each target locus for each sample and priorto determining whether there is a chromosomal abnormality for eachsample. For detection, the sample indices, the locus indices, and theidentification indices (if present), are sequenced.

In aspects of the invention using indices, the fixed sequenceoligonucleotides are preferably designed to comprise the indices.Alternatively, the indices and universal amplification sequences can beadded to the selectively amplified nucleic acids following initialamplification. In either case, preferably the indices are encodedupstream of the nucleic acid region-specific sequences but downstream ofthe universal primers so that they are preserved upon amplification, butalso require less sequencing to access when using the universal primersfor sequence determination.

The indices are non-complementary but unique sequences used within theprimer to provide information relevant to the selective nucleic acidregion that is isolated and/or amplified using the primer. The advantageof this is that information on the presence and quantity of the selectednucleic acid region can be obtained without the need to determine theactual sequence itself, although in certain aspects it may be desirableto do so. Generally, however, the ability to identify and quantify aselected nucleic acid region through identification of one or moreindices will decrease the length of sequencing required as the lociinformation is captured at the 3′ or 5′ end of the isolated selectednucleic acid region. Use of indices identification as a surrogate foridentification of selected nucleic acid regions may also reduce errorsince longer sequencing reads are more prone to the introduction orerror.

In addition to locus indices, allele indices and identification indices,additional indices can be introduced to primers to assist in themultiplexing of samples. For example, correction indices which identifyexperimental error (e.g., errors introduced during amplification orsequence determination) can be used to identify potential discrepanciesin experimental procedures and/or detection methods in the assaysystems. The order and placement of these indices, as well as the lengthof these indices, can vary, and they can be used in variouscombinations.

The primers used for identification and quantification of a selectednucleic acid region may be associated with regions complementary to the5′ of the selected nucleic acid region, regions complementary to the 5′of the selected nucleic acid region, or in certain amplification regimesthe indices may be present on one or both of a set of amplificationprimers which comprise sequences complementary to the sequences of theselected nucleic acid region. The primers can be used to multiplex theanalysis of multiple selected nucleic acid regions to be analyzed withina sample, and can be used either in solution or on a solid substrate,e.g., on a microarray or on a bead. These primers may be used for linearreplication or amplification, or they may create circular constructs forfurther analysis.

Comparative Hybridization for Identification of Differential Frequencyof Loci and Alleles

In a specific aspect of the invention, an assay system of the inventionemploys two index sequences that allow direct comparison of levels ofparticular genomic regions in a sample using array hybridization. Theassay employs directed analysis assays to select specific loci ofinterest using labeled oligonucleotides that selectively hybridize totwo or more genomic regions within a sample. The oligonucleotides thatselectively hybridize to different regions are differentially labeled sothat the specific locus associated with a label can be identified.Preferably, the label is an optically detectable label (e.g., using afluorescent label).

The first fixed oligonucleotide comprises sequences that selectivelyhybridize to a feature on the array (generally an oligonucleotide thatis complementary to the first index) and a region that selectivelyhybridizes to a region of interest. The second fixed oligonucleotidecomprises a region that selectively hybridizes to the same region ofinterest, either adjacently or within a selected number of interveningbases, and a region used to associate the oligonucleotide to the label.Where the fixed oligonucleotides bind to immediately adjacent regions onthe array, the fixed oligonucleotides are ligated to create a contiguousligation product comprising a locus- or allele-specific label that canbe introduced to an array. In the case where the set of fixedoligonucleotides do not bind to immediately adjacent regions within thegenomic region, the intervening region can be closed using primerextension, as discussed in previous sections, and/or one or morebridging oligonucleotides can be used that hybridize between the fixedsequence oligonucleotides. These are then ligated to create a contiguousligation product with a locus- or allele-specific label.

In one aspect, the sets of fixed sequence oligonucleotides are used inpairs, with each member of the pair selective for a different chromosomeor locus. FIG. 12 Illustrates an example, where each set of the pair areselective for a genomic region on a different chromosome. Assay sets fortwo different chromosomes are evaluated competitively on a singlehybridization feature on the array. Two fixed oligonucleotide sets 1208,1210 selective for genomic regions 1202, 1204 on the two differentchromosomes. Each set of fixed oligonucleotides is associated with anoptically differentiated label 1201, 1203. Sequences complementary touniversal primers 1205, 1211 are located at or near the ends of thefixed sequence oligonucleotides. Both of the contiguous ligationproducts created from the fixed oligonucleotides comprise sequences 1207that are complementary to the same array feature. The level ofhybridization of the optically differentiated contiguous ligationproducts can be measured to provide relative amount of the firstchromosome region as compared to the second chromosome region.

In another aspect, the sets of fixed sequence oligonucleotides are usedin pairs, with each member of the pair selective for a different alleleof a particular locus. FIG. 13 Illustrates such an example, where eachset of the pair are selective for different alleles of the same locus.Assay sets for two different chromosomes are evaluated competitively ona single hybridization feature on the array. Two fixed oligonucleotidesets 1308, 1310 selective for a locus 1302 are used to create thecontiguous ligation products. Each set of fixed oligonucleotides isassociated with an optically differentiated label 1301, 1303. Sequencescomplementary to universal primers 1305, 1311 are located at or near theends of the fixed sequence oligonucleotides. Both of the contiguousligation products created from the fixed oligonucleotides comprisesequences 1307 that are complementary to the same array feature. Thelevel of hybridization of the optically differentiated contiguousligation products can be measured to provide relative amount of thefirst allele as compared to the second allele.

It will be apparent to those skilled in the art that various differentregions can be used in such comparative hybridization assays, includingsequences that are specific to different loci on a single chromosome,loci that are similar but polymorphic between chromosomes (e.g., loci onthe pseudo-autosomal region of chromosomes X and Y) and the like.

In specific aspect, the detected levels of the contiguous hybridizationproducts are normalized to reduce any assay-specific or technicalvariation in hybridization. The expected ratio of detection of the twolabels for an assay pair and for many assay pairs from the samechromosomes are expected to be one if the regions are in the samerelative abundance in a sample. A variance in the amount of one region(e.g., from a first chromosome) will cause a variance from the expectedcolor ratio of one. A more abundant chromosome is expected to have thebrighter color in the assay pair comparisons.

Detection of Other Agents or Risk Factors

Given the multiplexed nature of the assay systems of the invention, incertain aspects it may be beneficial to utilize the assay to detectother nucleic acids that could pose a risk to the health of thesubject(s) or otherwise impact on clinical decisions about the treatmentor prognostic outcome for a subject. Such nucleic acids could includebut are not limited to indicators of disease or risk such as maternalalleles, polymorphisms, or somatic mutations known to present a risk formaternal or fetal health. Such indicators include, but are not limitedto, genes associated with Rh status; mutations or polymorphismsassociated with diseases such as diabetes, hyperlipidemia,hypercholesterolemia, blood disorders such as sickle cell anemia,hemophilia or thalassemia, cardiac conditions, etc.; exogenous nucleicacids associated with active or latent infections; somatic mutations orcopy number variations associated with autoimmune disorders ormalignancies (e.g., breast cancer), or any other health issue that mayimpact on the subject, and in particular on the clinical options thatmay be available in the treatment and/or prevention of health risks in asubject based on the outcome of the assay results.

Accordingly, as the preferred assay systems of the invention are highlymultiplexed and able to interrogate hundreds or even thousands ofnucleic acids within a mixed sample, in certain aspects it is desirableto interrogate the sample for nucleic acid markers within the mixedsample, e.g., nucleic acids associated with genetic risk or thatidentify the presence or absence of infectious organisms. Thus, incertain aspects, the assay systems provide detection of such nucleicacids in conjunction with the detection of nucleic acids for copy numberdetermination within a mixed sample.

For example, in certain mixed samples of interest, including maternalsamples, samples from subjects with autoimmune disease, and samples frompatients undergoing chemotherapy, the immune suppression of the subjectmay increase the risk for the disease due to changes in the subject'simmune system. Detection of exogenous agents in a mixed sample may beindicative of exposure to and infection by an infectious agent, and thisfinding have an impact on patient care or management of an infectiousdisease for which a subject tests positively for such infectious agent.

Specifically, changes in immunity and physiology during pregnancy maymake pregnant women more susceptible to or more severely affected byinfectious diseases. In fact, pregnancy itself may be a risk factor foracquiring certain infectious diseases, such as toxoplasmosis, Hansendisease, and listeriosis. In addition, for pregnant women or subjectswith suppressed immune systems, certain infectious diseases such asinfluenza and varicella may have a more severe clinical course,increased complication rate, and higher case-fatality rate.Identification of infectious disease agents may therefore allow bettertreatment for maternal disease during pregnancy, leading to a betteroverall outcome for both mother and fetus.

In addition, certain infectious agents can be passed to the fetus viavertical transmission, i.e. spread of infections from mother to baby.These infections may occur while the fetus is still in the uterus,during labor and delivery, or after delivery (such as whilebreastfeeding).

Thus, is some preferred aspects, the assay system may include detectionof exogenous sequences, e.g., sequences from infectious organisms thatmay have an adverse effect on the health and/or viability of the fetusor infant, in order to protect maternal, fetal, and or infant health.

Exemplary infections which can be spread via vertical transmission, andwhich can be tested for using the assay methods of the invention,include but are not limited to congenital infections, perinatalinfections and postnatal infections.

Congenital infections are passed in utero by crossing the placenta toinfect the fetus. Many infectious microbes can cause congenitalinfections, leading to problems in fetal development or even death.TORCH is an acronym for several of the more common congenitalinfections. These are: toxoplasmosis, other infections (e.g., syphilis,hepatitis B, Coxsackie virus, Epstein-Barr virus, varicella-zoster virus(chicken pox), and human parvovirus B19 (fifth disease)), rubella,cytomegalovirus (CMV), and herpes simplex virus.

Perinatal infections refer to infections that occur as the baby movesthrough an infected birth canal or through contamination with fecalmatter during delivery. These infections can include, but are notlimited to, sexually-transmitted diseases (e.g., gonorrhea, chlamydia,herpes simplex virus, human papilloma virus, etc.) CMV, and Group BStreptococci (GBS).

Infections spread from mother to baby following delivery are known aspostnatal infections. These infections can be spread duringbreastfeeding through infectious microbes found in the mother's breastmilk. Some examples of postnatal infections are CMV, Humanimmunodeficiency virus (HIV), Hepatitis C Virus (HCV), and GBS.

EXAMPLES

The following examples are put forth so as to provide those of ordinaryskill in the art with a complete disclosure and description of how tomake and use the present invention, and are not intended to limit thescope of what the inventors regard as their invention, nor are theyintended to represent or imply that the experiments below are all of orthe only experiments performed. It will be appreciated by personsskilled in the art that numerous variations and/or modifications may bemade to the invention as shown in the specific aspects without departingfrom the spirit or scope of the invention as broadly described. Thepresent aspects are, therefore, to be considered in all respects asillustrative and not restrictive.

Efforts have been made to ensure accuracy with respect to numbers used(e.g., amounts, temperature, etc.) but some experimental errors anddeviations should be accounted for. Unless indicated otherwise, partsare parts by weight, molecular weight is weight average molecularweight, temperature is in degrees centigrade, and pressure is at or nearatmospheric.

Example 1 General Aspects of the Assay Systems of the Invention

A number of assay formats were tested to demonstrate the ability toperform selective amplification and detection of independent loci todemonstrate multiplexed, ligation-based detection of a large number(e.g., 96 or more) of nucleic acid regions of interest using highlymultiplexed formats.

These assays were designed based on human genomic sequences, and eachinterrogation consisted of two fixed sequence oligos per selectednucleic acid region interrogated in the assay. The first oligo,complementary to the 3′ region of a genomic region, comprised thefollowing sequential (5′ to 3′) oligo elements: a universal PCR primingsequence common to all assays: TACACCGGCGTTATGCGTCGAGAC (SEQ ID NO:1); anine nucleotide identification code specific to the selected loci; a 9base locus- or locus/allele-specific sequence that acts as a locus codein the first SNP-independent set and a locus/allele code in theSNP-specific second set; a hybridization breaking nucleotide which isdifferent from the corresponding base in the genomic locus; and a 20-24bp sequence complementary to the selected genomic locus. In cases wherea SNP is detected in this portion of the selected genomic locus, theallele-specific interrogation set consisted of two first tandem ligationprimers, each with a different locus/allele code and a differentallele-specific base at the SNP position. These first oligos weredesigned for each selected nucleic acid to provide a predicted uniformTm with a two degree variation across all interrogations in the 480assay set.

The second fixed sequence oligo, complementary to the 5′ region of thegenomic loci, comprised the following sequential (5′ to 3′) elements: a20-24b sequence complimentary to the 5′ region in the genomic locus; ahybridization breaking nucleotide which was different from thecorresponding base in the genomic locus; and a universal PCR primingsequence which was common to all third oligos in the assay set:ATTGCGGGGACCGATGATCGCGTC (SEQ ID NO:2). In cases where a SNP wasdetected in this portion of the selected genomic locus, theallele-specific interrogation set consisted of two tandem ligationprimers, each with a different locus/allele code and a differentallele-specific base at the SNP position. This second oligo was designedfor each selected nucleic acid to provide a predicted uniform Tm with atwo degree variation across all interrogations in the 480 assay set thatwas substantially the same Tm range as the first oligo set.

In certain tested aspects, one or more bridging oligos was used thatwere complementary to the genomic locus sequence between the regioncomplementary to the first and second fixed sequence oligos used foreach selected nucleic acid region. In specific aspects tested, more thanone bridging oligo was used to span the gap between the fixed sequenceoligonucleotides, and the one or more oligo may optionally be designedto identify one or more SNPs in the sequence. The length of the bridgingoligonucleotides used in the assay systems varied from 5 to 36 basepairs.

All oligonucleotides used in the tandem ligation formats weresynthesized using conventional solid-phase chemistry. The oligos of thefirst fixed set and the bridging oligonucleotides were synthesized with5′ phosphate moieties to enable ligation to 3′ hydroxyl termini ofadjacent oligonucleotides.

Example 2 Preparation of DNA for Use in Tandem Ligation Procedures

Genomic DNA from a Caucasian male (NA12801) or a Caucasian female(NA11995) was obtained from Coriell Cell Repositories (Camden, N.J.) andfragmented by acoustic shearing (Covaris, Woburn, Mass.) to a meanfragment size of approximately 200 bp.

The Coriell DNA was biotinylated using standard procedures. Briefly, theCovaris fragmented DNA was end-repaired by generating the followingreaction in a 1.5 ml microtube: 5 ug DNA, −12 μl 10×T4 ligase buffer(Enzymatics, Beverly Mass.), 50 U T4 polynucleotide kinase (Enzymatics,Beverly Mass.), and H20 to 120 μl. This was incubated at 37° C. for 30minutes. The DNA was diluted using 10 mM Tris 1 mM EDTA pH 8.5 todesired final concentration of ˜0.5 ng/ul.

5 μl DNA was placed in each well of a 96-well plate, and the platesealed with an adhesive plate sealer and spun for 10 seconds at 250×g.The plate was then incubated at 95° C. for 3 minutes, and cooled to 25°C., and spun again for 10 seconds at 250×g. A biotinylation master mixwas prepared in a 1.5 ml microtube to final concentration of: 1×TdTbuffer (Enzymatics, Beverly Mass.), 8 U TdT (Enzymatics, Beverly Mass.),250 μM CoCl₂, 0.01 nmol/μl biotin-16-dUTP (Roche, Nutley N.J.), and H₂0to 1.5 ml. 15 μl of the master mix was aliquoted into each well of a 96well plate, and the plate sealed with adhesive plate sealer. The platewas spun for 10 seconds at 250×g and incubated for 37° C. for 60minutes. Following incubation, the plate was spun again for 10 secondsat 250×g, and 7.5 μl precipitation mix (1 ng/μl Dextran Blue, 3 mMNaOAC) was added to each well.

The plate was sealed with an adhesive plate sealer and mixed using anIKA plate vortexer for 2 minutes at 3000 rpm. 27.5 μl of isopropanol wasadded into each well, the plate sealed with adhesive plate sealer, andvortexed for 5 minutes at 3000 rpm. The plate was spun for 20 minutes at3000×g, the supernatant was decanted, and the plate inverted andcentrifuged at 10×g for 1 minute onto an absorbent wipe. The plate wasair-dried for 5 minutes, and the pellet resuspended in 10 μl 10 mM TrispH8.0, 1 mM EDTA.

Example 3 Exemplary Assay Formats Using Tandem Ligation

Numerous tandem ligation assay formats using the biotinylated DNA weretested to illustrate proof of concept for the assay systems of theinvention, and demonstrated the ability to perform highly multiplexed,targeted detection of a large number of independent loci using theseries of different assay formats. The exemplary assay systems of theinvention were designed to comprise 96 or more interrogations per lociin a genetic sample, and in cases where SNPs were detected the assayformats utilized 192 or more separate interrogations, each utilizing thedetection of different alleles per 96 loci in genetic samples. Theexamples described for each assay format utilized two different sets offixed sequence oligonucleotides and/or bridging oligos (as described inExample 1), comprising a total 96 or 192 interrogation reactions for theselected nucleic acid regions depending upon whether SNPs wereidentified.

A first exemplary assay format used locus-specific fixed sequence oligosand bridging oligos, where there was a one base gap between the firstfixed sequence oligo and the bridging oligos, and a second one base gapbetween the bridging oligos and the second fixed sequence oligo, Each ofthe two gaps encompassed two different SNPs. In this format, a DNApolymerase was used to incorporate each of the SNP bases, and ligase wasused to seal the nicks formed thereby. SNP base discrimination derivedfrom the fidelity of base incorporation by the polymerase, and in theevent of mis-incorporation, the tendency of ligase to not seal nicksadjacent to mismatched bases.

The second exemplary assay format used two locus-specific fixed sequenceoligonucleotides without a bridging oligo, where there was a ˜15-35 basegap between the fixed sequence oligos, and where the gap spanned one ormore SNPs. In this format, a polymerase was used to incorporate themissing bases, and a ligase was used to seal the nick formed thereby.SNP base discrimination derived from the fidelity of base incorporationby the polymerase, and in the event of misincorporation, the tendency ofligase to not seal nicks adjacent to mismatched bases.

A third exemplary assay format used allele-specific first and secondfixed sequence oligos without a bridging oligo, where there was a ˜15-35base gap between the first and second fixed sequence oligos, and wherethe gap spanned one or more SNPs. Two separate allele-specific firstfixed sequence oligos and two separate allele-specific second fixedsequence oligos were used. A polymerase was used to incorporate themissing bases, and a ligase was used to seal the nick formed thereby.SNP base discrimination derived from hybridization specificity, thetendency of non-proofreading polymerase to not extend annealed primerswith mismatches near the 3′ end, and the tendency of the ligase to notseal nicks adjacent to mismatched bases.

A fourth exemplary format used allele-specific fixed sequence oligos anda locus-specific bridging oligo. In this format, two separate fixedsequence oligos complementary to the 3′ end of the loci of interest, thefirst with a 3′ base specific for one allele of the targeted SNP, andthe second with a 3′ base specific for the other allele of the targetedSNP. Similarly, two separate second fixed sequence oligos were used, thefirst with a 5′ base specific for one allele of a second targeted SNP,and the second with a 5′ base specific for the other allele of thesecond targeted SNP. The bridging oligos were complementary to theregion directly adjacent to the locus regions complementary to the firstand second fixed sequence oligos, and thus no polymerase was neededprior to ligation. Ligase was used to seal the nicks between the fixedsequence oligos and the bridging oligo. SNP base discrimination in thisassay format derived from hybridization specificity and the tendency ofthe ligase to not seal nicks adjacent to mismatched bases. Thisexemplary format was tested using either T4 ligase or Taq ligase forcreation of the contiguous template, and both were proved effective inthe reaction as described below.

A fifth exemplary format used locus-specific fixed sequence oligos thatwere complementary to adjacent regions on the nucleic acid of interest,and thus no gap was created by hybridization of these oligos. In thisformat, no polymerase was required, and a ligase was used to seal thesingle nick between the oligos.

A sixth exemplary format used allele-specific fixed sequence oligos andlocus-specific bridging oligos, where there was a short base gap of fivebases between the loci region complementary to the fixed sequenceoligos. The locus-specific bridging oligo in this example was a 5mercomplementary to the regions directly adjacent to the regionscomplementary to the first and second fixed sequence oligos. In thisformat, no polymerase was required, and a ligase was used to seal thetwo nicks between the oligos.

A seventh exemplary format used locus-specific fixed sequence oligos anda locus-specific bridging oligo, where there was a shorter base gap offive bases containing a SNP in the region complementary to the bridgingoligo. Allele-specific bridging oligos corresponding to the possibleSNPs were included in the hybridization and ligation reaction. In thisformat, no polymerase was required, and a ligase is used to seal the twonicks between the oligos. SNP base discrimination in this assay formatderives from hybridization specificity and the tendency of the ligase tonot seal nicks adjacent to mismatched bases.

An eighth exemplary format used locus-specific fixed sequence oligos andtwo adjacent locus-specific bridging oligos, where there is a 10 basegap between the regions complementary to the first and second fixedsequence oligos. Locus-specific bridging oligos were included in theligation reaction, with the gap requiring two contiguous 5mers to bridgethe gap. In this format, no polymerase is required, and a ligase is usedto seal the three nicks between the oligos.

For each of the above-described assay formats, an equimolar pool (40 nMeach) of sets of first and second loci- or allele-specific fixedoligonucleotides was created from the oligos prepared as set forth inExample 2. A separate equimolar pool (20 μM each) of bridgingoligonucleotides was likewise created for the assay processes based onthe sequences of the selected genomic loci.

10 μg of strepavidin beads were transferred into the wells of a 96 wellplate, and the supernatant was removed. 60 μl BB2 buffer (100 mM Tris pH8.0, mM EDTA, 500 mM NaCl2, 58% formamide, 0.17% Tween-80), 10 μL 40 nMfixed sequence oligo pool and 30 μL of the biotinylated template DNAprepared in Example 2 were added to the beads. The plate was sealed withan adhesive plate sealer and vortexed at 3000 rpm until beads wereresuspended. The oligos were annealed to the template DNA by incubationat 70 C for 5 minutes, followed by slow cooling to room temperature.

The plate was placed on a raised bar magnetic plate for 2 minutes topull the magnetic beads and associated DNA to the side of the wells. Thesupernatant was removed by pipetting, and was replaced with 50 uL of 60%BB2 (v/v in water). The beads were resuspended by vortexing, placed onthe magnet again, and the supernatant was removed. This bead washprocedure was repeated once using 50 uL 60% BB2, and repeated twice moreusing 50 uL wash buffer (10 mM Tris pH 8.0, 1 mM EDTA, 50 mM NaCl2).

The beads were resuspended in 37 μl ligation reaction mix consisting of1×Taq ligase buffer (Enzymatics, Beverly Mass.), 10 U Taq ligase, and 2uM bridging oligo pool (depending on the assay format), and incubated at37° C. for one hour. Where appropriate, and depending on the assayformat, a non-proofreading thermostable polymerase plus 200 nM each dNTPwas included in this mixture. The plate was placed on a raised barmagnetic plate for 2 minutes to pull the magnetic beads and associatedDNA to the side of the wells. The supernatant was removed by pipetting,and was replaced with 50 uL wash buffer. The beads were resuspended byvortexing, placed on the magnet again, and the supernatant was removed.The wash procedure was repeated once.

To elute the products from the strepavidin beads, 30 μl of 10 mM Tris 1mM EDTA, pH 8.0 was added to each well of 96-well plate. The plate wassealed and mixed using an IKA vortexer for 2 minutes at 3000 rpm toresuspend the beads. The plate was incubated at 95° C. for 1 minute, andthe supernatant aspirated using an 8-channel pipetter. 25 μl ofsupernatant from each well was transferred into a fresh 96-well platefor universal amplification.

Example 4 Universal Amplification of Tandem Ligated Products

The polymerized and/or ligated nucleic acids were amplified usinguniversal PCR primers complementary to the universal sequences presentin the first and second fixed sequence oligos hybridized to the nucleicacid regions of interest. 25 μl of each of the reaction mixtures ofExample 3 were used in each amplification reaction. A 50 uL universalPCR reaction consisting of 25 uL eluted ligation product plus 1× Pfusionbuffer (Finnzymes, Finland), 1M Betaine, 400 nM each dNTP, 1 U Pfusionerror-correcting thermostable DNA polymerase, and the following primerpairs:

TAATGATACGGCGACCACCGAGATCTACACCGGCGTTATGCGT CGAGA (SEQ ID NO:3) and

TCAAGCAGAAGACGGCATACGAGATXAAACGACGCGATCATCG GTCCCCGCAA (SEQ ID NO:4),where X represents one of 96 different sample tags used to uniquelyidentify individual samples prior to pooling and sequencing. The PCR wascarried out under stringent conditions using a BioRad Tetrad™thermocycler.

10 μl of universal PCR product from each of the samples were pooled andthe pooled PCR product was purified using AMPure™ SPRI beads(Beckman-Coulter, Danvers, Mass.), and quantified using Quant-iT™PicoGreen, (Invitrogen, Carlsbad, Calif.).

Example 5 Detection and Analysis of Selected Loci

The purified PCR products of each assay format were sequenced on asingle lane of a slide on an Illumina HiSeq 2000. Sequencing runstypically give rise to ˜100M raw reads, of which ˜85M (85%) mapp toexpected assay structures. This translat to an average of ˜885Kreads/sample across the experiment, and (in the case of an experimentusing 96 loci) 9.2K reads/replicate/locus across 96 loci. The mappedreads were parsed into replicate/locus/allele counts, and variousmetrics were computed for each condition, including:

Yield: a metric of the proportion of input DNA that was queried insequencing, computed as the average number of unique reads per locus(only counting unique identification code reads per replicate/locus)divided by the total number of genomic equivalents contained in theinput DNA.

80 percentile locus frequency range: a metric of the locus frequencyvariability in the sequencing data, interpreted as the fold range thatencompasses 80% of the loci. It is computed on the distribution of totalreads per locus, across all loci, as the 90th percentile of total readsper locus divided by the 10th percentile of the total reads per locus.

SNP error rate: a metric of the error rate at the SNP position, andcomputed as the proportion of reads containing a discordant base at theSNP position.

These results are summarized in Table 1:

TABLE 1 Results Summary of Tandem Ligation Assay Formats BRIDGING 80%LOC SNP ASSAY FIXED SEQUENCE OLIGO ENZYME FREQ ERROR FORMAT OLIGO(1^(st) and/or 2^(nd)) USED USED YIELD RANGE RATE 1 LOCUS-SPECIFIC Locusspecific pol + lig 9.5% 5.3 0.18% 2 LOCUS-SPECIFIC No pol + lig 1.4%58.3 0.19% 3 ALLELE-SPECIFIC No pol + lig 0.4% 61.7 1.00% 4ALLELE-SPECIFIC Locus specific Taq lig 5.0% 5.9 0.92% 4 ALLELE-SPECIFICLocus specific T4 lig 5.3% 4.4 0.95% 5 LOCUS-SPECIFIC No Taq lig 22.5%1.7 NA 6 LOCUS-SPECIFIC Locus specific Taq lig 12.5 2.9 NA 7LOCUS-SPECIFIC Allele specific Taq lig 14.3 2.8 0.20% 8 LOCUS-SPECIFIC 2Locus Taq lig 18.5% 2.8 NA specific

Table 1 indicates that the locus-specific tandem ligation assay using abridging oligo converted template DNA into targeted product with highyield (˜10%), with a high proportion of product derived from targetedloci (15% of reads did not contain expected assay structures), withlimited locus bias (80% of loci fall within a ˜5-fold concentrationrange), and with high SNP accuracy (0.2% SNP error rate). Thelocus-specific tandem ligation assay without the use of a bridging oligoproduced reduced yields and substantial locus bias, but still producedhigh accuracy SNP genotyping data. The allele-specific tandem ligationassay with a bridging oligo produced intermediate yields compared to thelocus-specific assay using both T4 and Taq ligase, but still producedlimited locus bias and high accuracy SNP genotyping data. Theallele-specific tandem ligation assay without a bridging producedreduced yields and substantial locus bias, but still produced highaccuracy SNP genotyping data.

Assay formats five and six showed that template DNA can be convertedinto targeted product with high yield (12-16%), with a high proportionof product derived from targeted loci (˜76% of reads contained expectedassay structures), and with limited locus bias (80% of loci fall withina 2-3-fold concentration range). FIG. 14 illustrates the genotypingperformance that is obtained using assay format seven, comparing thesequence counts for the two alleles of all polymorphic assays observedin a single sample. Note the clear separation of the homozygous andheterozygous clusters, as well as the low background counts observedamongst the homozygous clusters.

Example 6 Determination of Percent Fetal DNA Using Tandem Ligation

One exemplary assay system of the invention was designed comprising 480separate interrogations, each utilizing the detection of different lociin a maternal sample. The initial example utilized a determination ofpercent fetal DNA in subjects carrying a male fetus, and so loci on theY chromosome were utilized as well as loci containing apaternally-inherited fetal SNP that is different from the maternalsequence.

Specifically, 480 selected nucleic acids were interrogated using theassay system. The 480 selected nucleic acids comprised 48sequence-specific interrogations of nucleic acids corresponding to locion chromosome Y, 192 sequence-specific interrogations of nucleic acidscorresponding to loci on chromosome 21, 192 sequence-specificinterrogations of selected nucleic acids corresponding to loci onchromosome 18, and 144 sequence-specific interrogations of selectednucleic acids corresponding to polymorphic loci on chromosomes 1-16which. These assays were designed based on human genomic sequences, andeach interrogation used three oligos per selected nucleic acidinterrogated in the assay.

The first oligo used for each interrogation was complementary to the 3′region of the selected genomic region, and comprised the followingsequential (5′ to 3′) oligo elements: a universal PCR priming sequencecommon to all assays: TACACCGGCGTTATGCGTCGAGAC (SEQ ID NO:1); anidentification code specific to the selected loci comprising ninenucleotides; and a 20-24 bp sequence complementary to the selectedgenomic locus. This first oligo was designed for each selected nucleicacid to provide a predicted uniform Tm with a two degree variationacross all interrogations in the 480 assay set.

The second oligo used for each interrogation was a bridging oligocomplementary to the genomic locus sequence directly adjacent to thegenomic region complementary to the first oligonucleotide. Based on theselected nucleic acids of interest, the bridging oligos were designed toallow utilization of a total of 12 oligonucleotide sequences that couldserve as bridging oligos for all of the 480 interrogations in the assayset.

The third oligo used for each interrogation was complementary to the 5′region of the selected genomic locus, comprised the following sequential(5′ to 3′) elements: a 20-24b sequence complimentary to the 5′ region inthe genomic locus; a hybridization breaking nucleotide which wasdifferent from the corresponding base in the genomic locus; and auniversal PCR priming sequence which is common to all third oligos inthe assay set: ATTGCGGGGACCGATGATCGCGTC (SEQ ID NO:2). This third oligowas designed for each selected nucleic acid to provide a predicteduniform Tm with a two degree variation across all interrogations in the480 assay set, and the Tm range was substantially the same as the Tmrange as the first oligo set.

All oligonucleotides were synthesized using conventional solid-phasechemistry. The first and bridging oligonucleotides were synthesized with5′ phosphate moieties to enable ligation to 3′ hydroxyl termini ofadjacent oligonucleotides. An equimolar pool of sets of the first andthird oligonucleotides used for all interrogations in the multiplexedassay was created, and a separate equimolar pool of all bridgingoligonucleotides was created to allow for separate hybridizationreactions.

Genomic DNA was isolated from 5 mL plasma using the Dynal Silane viralNA kit (Invitrogen, Carlsbad, Calif.). Approximately 12 ng DNA wasprocessed from each of 37 females, including 7 non-pregnant femalesubjects, 10 female subjects pregnant with males, and 22 female subjectspregnant with females. The DNA was biotinylated using standardprocedures, and the biotinylated DNA was immobilized on a solid surfacecoated with strepavidin to allow retention of the genomic DNA insubsequent assay steps.

The immobilized DNA was hybridized to the first pool comprising thefirst and third oligos for each interrogated sequences under stringenthybridization conditions. The unhybridized oligos in the pool were thenwashed from the surface of the solid support, and the immobilized DNAwas hybridized to the pool comprising the bridging oligonucleotidesunder stringent hybridization conditions. Once the bridgingoligonucleotides were allow to hybridize to the immobilized DNA, theremaining unbound oligos were washed from the surface and the threehybridized oligos bound to the selected nucleic acid regions wereligated using T4 ligase to provide a contiguous DNA template foramplification.

The ligated DNA was amplified from the solid substrate using an errorcorrecting thermostable DNA polymerase, a first universal PCR primerTAATGATACGGCGACCACCGAGATCTACACCGGCGTTATGCGTCGAGA (SEQ ID NO:3) and asecond universal PCR primerTCAAGCAGAAGACGGCATACGAGATXAAACGACGCGATCATCGGTCCCC GCAA (SEQ ID NO:4),where X represents one of 96 different sample indices used to uniquelyidentify individual samples prior to pooling and sequencing. 10 μL ofuniversal PCR product from each of the 37 samples described above wereand the pooled PCR product was purified using AMPure SPRI beads(Beckman-Coulter, Danvers, Mass.), and quantified using Quant-iT™PicoGreen, (Invitrogen, Carlsbad, Calif.).

The purified PCR product was sequenced on 6 lanes of a single slide onan Illumina HiSeq™ 2000. The sequencing run gave rise to 384M raw reads,of which 343M (89%) mapped to expected genomic loci, resulting in anaverage of 3.8M reads per sample across the 37 samples, and 8K reads persample per locus across the 480 loci. The mapped reads were parsed intosample and locus counts, and two separate metrics of percent fetal DNAwere computed as follows.

Percent male DNA detected by chromosome Y loci corresponds to therelative proportion of reads derived from chromosome Y locusinterrogations versus the relative proportion of reads derived fromautosomal locus interrogations, and is computed as (number of chromosomeY reads in a test subject/number of autosome reads in testsubject)/(number of reads in male control subject/number of autosomereads in the male control subject). This metric was used as a measure ofpercent fetal DNA in the case of a male fetus using the relative readsof chromosome Y.

Percent fetal DNA detected by polymorphic loci corresponds to theproportion of reads derived from non-maternal versus maternal alleles atloci where such a distinction can be made. First, for each identifiedlocus, the number of reads for the allele with the fewest counts (thelow frequency allele) was divided by the total number of reads toprovide a minor allele frequency (MAF) for each locus. Then, loci withan MAF between 0.075% and 15% were identified as informative loci. Theestimated percent fetal DNA for the sample was calculated as the mean ofthe minor allele frequency of the informative loci multiplied by two,i.e. computed as 2× average (MAF) occurrence where 0.075%<MAF<15%.

FIG. 15 demonstrates the results from these computations. As shown inFIG. 15, the percent male loci determined using the above-describedchromosome Y metrics (grey circles) can separate pregnancies involvingmale fetuses from pregnancies involving female fetuses (grey diamonds)and non-pregnant samples (black circles). In addition, computation ofthe percent fetal amount in a sample by polymorphic loci metric candistinguish pregnant samples from non-pregnant samples. Finally, thereis a correlation between the percent fetal DNA estimates for a sampleobtained from chromosome Y and polymorphic loci in pregnancies involvingmale fetuses. This correlation persists down to quite low percent fetalvalues.

While this invention is satisfied by aspects in many different forms, asdescribed in detail in connection with preferred aspects of theinvention, it is understood that the present disclosure is to beconsidered as exemplary of the principles of the invention and is notintended to limit the invention to the specific aspects illustrated anddescribed herein. Numerous variations may be made by persons skilled inthe art without departure from the spirit of the invention. The scope ofthe invention will be measured by the appended claims and theirequivalents. The abstract and the title are not to be construed aslimiting the scope of the present invention, as their purpose is toenable the appropriate authorities, as well as the general public, toquickly determine the general nature of the invention. In the claimsthat follow, unless the term “means” is used, none of the features orelements recited therein should be construed as means-plus-functionlimitations pursuant to 35 U.S.C. §112, ¶6.

We claim:
 1. A method for detecting a variance in the frequency of agenomic region in a genetic sample, comprising the steps of: providing agenetic sample; introducing at least two sets of first and second fixedsequence oligonucleotides to the genetic sample under conditions thatallow the sets of fixed sequence oligonucleotides to specificallyhybridize to complementary regions in nucleic acid regions of interest,wherein each set comprises an oligonucleotide associated with anoptically detectable label, and wherein both sets comprise a region thatbinds selectively to a single array feature; ligating the hybridizedoligonucleotides to create contiguous ligation products complementary tonucleic acid regions of interest; introducing the contiguous ligationproducts from both sets to an array comprising one or more featurescomplementary to the contiguous ligation products; and detectinghybridization of the contiguous ligation products from the first andsecond set to the array by detection of the optically detectable labels;wherein the relative frequency of the optically detectable labels on thearray is indicative of the presence or absence of a variance in thefrequency of a nucleic acid region of interest in the genetic sample. 2.The method of claim 1, further comprising amplifying the contiguousligation products prior to introduction to the array.
 3. The method ofclaim 2, wherein one or both of the first or second fixed sequenceoligonucleotides of each set comprise universal primer regions.
 4. Themethod of claim 1, further comprising introducing one or more bridgingoligonucleotides for each set under conditions that allow the bridgingoligonucleotides to hybridize to complementary regions in the nucleicacid regions of interest between the first and second fixed sequenceoligonucleotides to create hybridization products.
 5. The method ofclaim 4, wherein the hybridization products of the fixed sequenceoligonucleotides and the nucleic acid regions of interest to which theyhybridize are isolated prior to introduction of the bridgingoligonucleotides.
 6. The method of claim 4, wherein the one or morebridging oligonucleotides are introduced simultaneously with the firstand second fixed sequence oligonucleotides.
 7. The method of claim 1,wherein hybridization of the contiguous ligation products compriseshybridization to individual oligonucleotides bound to the array.
 8. Themethod of claim 1, wherein the label is a fluorescent label.
 9. Themethod of claim 1, wherein the variance is detected by a variation fromthe expected ratio of nucleic acids of interest in the genetic sample.10. The method of claim 9, wherein the variance is detected by anincreased or decreased level of hybridization of one set of contiguousligation products as compared to the second set of contiguous ligationproducts.
 11. A method for detecting regions of interest correspondingto a first and second chromosome in a genetic sample, comprising thesteps of: providing a genetic sample; introducing at least two sets offirst and second fixed sequence oligonucleotides to the genetic sampleunder conditions that allow the sets of fixed sequence oligonucleotidesto specifically hybridize to complementary regions in nucleic acidregions of interest, wherein the first set of fixed sequenceoligonucleotides is complementary to a genomic region on a firstchromosome and the second set of fixed sequence oligonucleotides iscomplementary to a genomic region on a second chromosome, and whereineach set comprises an oligonucleotide associated with an opticallydetectable label, and wherein both sets comprise a region that bindsselectively to a single array feature; ligating the hybridizedoligonucleotides to create contiguous ligation products complementary tonucleic acid regions of interest; introducing the contiguous ligationproducts from both sets to an array comprising one or more featurescomplementary to the contiguous ligation products; and detectinghybridization of the contiguous ligation products from the first andsecond set to the array by detection of the optically detectable labels;wherein the relative frequency of the optically detectable labels on thearray is indicative of the presence or absence of a variance in thefrequency of a first and second chromosome in the genetic sample. 12.The method of claim 11, further comprising amplifying the contiguousligation products prior to introduction to the array.
 13. The method ofclaim 12, wherein one or both of the first or second fixed sequenceoligonucleotides of each set comprise universal primer regions.
 14. Themethod of claim 11, further comprising introducing one or more bridgingoligonucleotides for each set under conditions that allow to bridgingoligonucleotides to hybridize to complementary regions in the nucleicacid regions of interest between the first and second fixed sequenceoligonucleotides.
 15. The method of claim 14, wherein the hybridizationproducts of the fixed sequence oligonucleotides and the nucleic acidregions of interest to which they hybridize are isolated prior tointroduction of the bridging oligonucleotides.
 16. The method of claim14, wherein the one or more bridging oligonucleotides are introducedsimultaneously with the first and second fixed sequenceoligonucleotides.
 17. The method of claim 11, wherein hybridization ofthe contiguous ligation products comprises hybridization to individualoligonucleotides bound to the array.
 18. The method of claim 11, whereinthe label is a fluorescent label.
 19. The method of claim 11, whereinthe variance is detected by a variation from the expected ratio ofnucleic acids of interest in the genetic sample.
 20. The method of claim19, wherein the variance is detected by an increased or decreased levelof hybridization of one set of contiguous ligation products as comparedto the second set of contiguous ligation products.
 21. The method ofclaim 11, wherein the method is carried out for at least ten sets offixed sequence oligonucleotides complementary to a genomic region on afirst chromosome and at least ten sets of fixed sequenceoligonucleotides complementary to a genomic region on a secondchromosome.
 22. The method of claim 21, wherein the method is carriedout for at least 100 sets of fixed sequence oligonucleotidescomplementary to a genomic region on a first chromosome and at least 100sets of fixed sequence oligonucleotides complementary to a genomicregion on a second chromosome.
 23. The method of claim 22, wherein themethod is carried out for at least 200 sets of fixed sequenceoligonucleotides complementary to a genomic region on a first chromosomeand at least 200 sets of fixed sequence oligonucleotides complementaryto a genomic region on a second chromosome.
 24. The method of claim 23,wherein the method is carried out for at least 500 sets of fixedsequence oligonucleotides complementary to a genomic region on a firstchromosome and at least 500 sets of fixed sequence oligonucleotidescomplementary to a genomic region on a second chromosome.